GREP
Use of General Regular Expression Print: Example
1. To find out the traffic in Linux system.
2. To search for specific file having a certain module.
3. Count the number of cpu in the Server.
ifconfig eth0 |grep RX
grep pam_nologin /etc/pam.d/*
grep -c name /proc/cpuinfo
Some flags:
-c count the number of matching lines.
-v Inver the match
-i Case invertible.
Example This CSV:
drill,99,5
hammer,10,50
brush,5,100
lamp,25,30
screwdriver,5,23
table-saw,1099,3
#!/bin/bash
OLDIFS=$IFS
IFS=","
while read product price
do
echo -e "\e[1;33m$product \
========================\e[0m\n\
Price : \t $price \n\
done < $1
IFS=$OLDIFS
echo "my IFS is: $IFS"
drill ========================
Price : 99
Quantity : 5
hammer ========================
Price : 10
Quantity : 50
brush ========================
Price : 5
Quantity : 100
lamp ========================
Price : 25
Quantity : 30
screwdriver ========================
Price : 5
Quantity : 23
table-saw ========================
Price : 1099
Quantity : 3
Here IFS is the Internal Field Separator used in Linux. By default its a space. Its used to determine to do word splitting. For example.
#!/bin/bash
OLDFS=$IFS
IFS=','
while read product price quantity
do
echo "product is:" $product
echo "price is:" $price
echo "quantity is:" $quantity
done < $1
IFS=$OLDIFS
It would display product, price and quantity separately.
sh parsecsv.sh.bak tools |grep -A2 hammer
hammer ========================
Price : 10
Quantity : 50
There are fields such as -A, -B and -C
-A After Context
-B Before Context
-C Context
Regular Expression:
Regular expression are a sequence of patterns that define a search pattern. Typical use of regex example. Input validation, Search and replace, String parsing, Data Scraping, Syntax highlighing, Data mapping etc.
Anchors, Ranges, Boundaries and Validating Data:
^ Anchors
?
Example: grep ‘^server’ /etc/ntp.conf
grep '^server' /etc/ntp.conf
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst
Python Regular Expression Quick Guide
^ Matches the beginning of a line $ Matches the end of the line . Matches any character \s Matches whitespace \S Matches any non-whitespace character * Repeats a character zero or more times *? Repeats a character zero or more times (non-greedy) + Repeats a character one or more times +? Repeats a character one or more times (non-greedy) [aeiou] Matches a single character in the listed set [^XYZ] Matches a single character not in the listed set [a-z0-9] The set of characters can include a range ( Indicates where string extraction is to start ) Indicates where string extraction is to end
grep ‘4$’ /etc/logrotate.d/* Find the lines where the file has 4 at the end of a line.
cat -vet /etc/ntp.conf –> display the non-printing. You would see $ as well.
to print the blank line.
grep ‘^$’ /etc/ntp.conf
To print all the lines without the blank lines, just switch the option -v (Invert the flags).
Boundaries in Regular Expression
\s Whitespace
\b Word boundar
\B Reverse the meaning.
Ranges in Regular Expression
Any letter [A-Za-z]
Any digit [0-9]
Any lowercase letter and underscore [a-z_]
Matches 3,4,9 [349]
Exercise:
Jones,Bob,232-78-3456 Jackeson,Jane,, Federer,Jack,xxx-xx-xxxx Maw,Michael,1879-0 Alexander,Sally,345-89-8095 Beder,Ioana,567-34-9802 Staines,Brad,,
Find the Employee Records with invalid SSN records.
grep -vE ‘\b[0-9]{3}-[0-9]{2}-[0-9]{4}\b’ employees.txt
Quantifiers:
? Matches 0 or 1 time
* Matches 0 or more times.
+ Matches 1 or more times.
Summary:
regex | Description | |
---|---|---|
ˆ | Matches the beginning of the line. | |
$ | Matches the end of the line. | |
. | Matches any character (a wildcard). | \s | Matches a whitespace character. |
\S | Matches a non-whitespace character (opposite of \s). | |
* | Applies to the immediately preceding character and indicates to match zero or more of the preceding character. | |
*? | Applies to the immediately preceding character and indicates to match zero or more of the preceding character in “non-greedy mode”. | |
+ | Applies to the immediately preceding character and indicates to match zero or more of the preceding character. | |
+? | Applies to the immediately preceding character and indicates to match zero or more of the preceding character in “non-greedy mode”. | |
[aeiou] | Matches a single character as long as that character is in the specified set. In this example, it would match “a”, “e”, “i”, “o” or “u” but no other characters. | |
[a-z0-9] | You can specify ranges of characters using the minus sign. This example is a single character that must be a lower case letter or a digit. |
[ˆA-Za-z]
When the first character in the set notation is a caret, it inverts the logic. This
example matches a single character that is anything other than an upper or lower
case character.
( )
When parentheses are added to a regular expression, they are ignored for the purpose
of matching, but allow you to extract a particular subset of the matched string
rather than the whole string when using findall().
\b
Matches the empty string, but only at the start or end of a word.
\B
Matches the empty string, but not at the start or end of a word.
\d
Matches any decimal digit; equivalent to the set [0-9].
\D
Matches any non-digit character; equivalent to the set [ˆ0-9].
Quick References:
Regular Expression | Description |
---|---|
[abc] | Matches either an a, b or c character |
[^abc] | Matches any character except for an a, b or c |
[^a-z] | Matches any characters except those in the range a-z. |
[a-zA-Z] | Matches any characters between a-z or A-Z. You can combine as much as you please. |
. | Matches any character other than newline (or including newline with the /s flag) |
\s | Matches any space, tab or newline character. |
\S | Matches anything other than a space, tab or newline. |
\d | Matches any decimal digit. Equivalent to [0-9]. |
\D | Matches anything other than a decimal digit. |
\w | Matches any letter, digit or underscore. Equivalent to [a-zA-Z0-9_]. |
\W | Matches anything other than a letter, digit or underscore. |
AWK
awk -F “:” ‘{ print $1 }’ /etc/passwd
Different Fields in Apache Access Log:
Finding most Popular Browser in the Apache/Nginx access.log
awk ‘{print $12}’ access.log |sort |uniq -c
LogFormat of Apache access.log
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined %h is the remote host (ie the client IP) %l is the identity of the user determined by identd (not usually used since not reliable) %u is the user name determined by HTTP authentication %t is the time the request was received. %r is the request line from the client. ("GET / HTTP/1.0") %>s is the status code sent from the server to the client (200, 404 etc.) %b is the size of the response to the client (in bytes) Referer is the Referer header of the HTTP request (containing the URL of the page from which this request was initiated) if any is present, and "-" otherwise. User-agent is the browser identification string.
Field | Description |
---|---|
Field 1 | Client IP |