How to use Grep in Linux to find the exact text pattern
grep is a great utility to parse through big text of file and extract the text in a fraction of second. If we do grep -help, you will see lot of switches. I will touch upon few very useful ones which are used almost on daily basis. Lets start with basic functionality first.
How to use grep
Lets work on the nginx access.log to demo the grep commands in this article.
Lets check how many 200 status codes we have in the nginx file. Code 200 means that nginx server was able to retrieve the page successfully.
grep ' 200 ' access.log | wc -l
19
We see there are 19 hits with code 200.
wc -l gives us the lines count.
How to do "not of pattern" in Grep
Lets say instead of 200 code, we want to check if our server returned any other codes other than 200, that means we need to do "not of 200". To do that we have to use the switch -v, -v will exclude the pattern 200. Lets try that
grep -v ' 200 ' access.log | head -1
129.146.101.83 - - [17/Nov/2019:03:14:19 +0100] "GET / HTTP/1.0" 405 178 "-" "-" "-"
I piped the result to head -1, that means i want to see only first result. To find out the number of lines with code other than 200, just pipe the result to wc -l as shown below.
grep -v ' 200 ' access.log | wc -l
1191
Get exact matched pattern using Grep
Lets say we want to get all the codes, we can use -o switch along with regexp to find that. Here is an example
grep -o ' [0-9][0-9][0-9] ' access.log | head -2
405
404
In the above command, I am doing multiple things.
First I know the fact that number I am looking is a 3 digit code. That is why I used the regexp [0-9] that means any single digit match between 0-9, I repeated it 3 times because the code i am looking for is 3 digit code.
Second I know that 3 digit code has space in front and at the end.
Third I am piping the output to head just print first 2 codes for the demonstration purpose only. If you want to scroll through all the codes pipe it to less as shown below
grep -o ' [0-9][0-9][0-9] ' access.log | less
Lastly I am using a new switch -o, this will print only the matched code but not the whole line where it found the match as we saw in the command above.
If you run the command as I have shown above, you will see duplicate codes too
To remove them pipe it to sort command as shown below...
grep -o ' [0-9][0-9][0-9] ' access.log | sort -u
200
400
404
405
As we see above, we see only 4 unique codes.