search instagram arrow-down

Archives

Categories

Meta

Grep, sed and Awk in Linux

GREP

Use of General Regular Expression Print: Example
1. To find out the traffic in Linux system.
2. To search for specific file having a certain module.
3. Count the number of cpu in the Server.

ifconfig eth0 |grep RX
grep pam_nologin /etc/pam.d/*
grep -c name /proc/cpuinfo

Some flags:
-c count the number of matching lines.
-v Inver the match
-i Case invertible.

Example This CSV:

drill,99,5
hammer,10,50
brush,5,100
lamp,25,30
screwdriver,5,23
table-saw,1099,3


#!/bin/bash
OLDIFS=$IFS
IFS=","
while read product price
do
echo -e "\e[1;33m$product \
========================\e[0m\n\
Price : \t $price \n\

done < $1
IFS=$OLDIFS
echo "my IFS is: $IFS"

drill ========================
Price : 99
Quantity : 5

hammer ========================
Price : 10
Quantity : 50

brush ========================
Price : 5
Quantity : 100

lamp ========================
Price : 25
Quantity : 30

screwdriver ========================
Price : 5
Quantity : 23

table-saw ========================
Price : 1099
Quantity : 3

Here IFS is the Internal Field Separator used in Linux. By default its a space. Its used to determine to do word splitting. For example.

#!/bin/bash
OLDFS=$IFS
IFS=','
while read product price quantity
do
echo "product is:" $product
echo "price is:" $price
echo "quantity is:" $quantity
done < $1
IFS=$OLDIFS

It would display product, price and quantity separately.

sh parsecsv.sh.bak tools |grep -A2 hammer
hammer ========================
Price : 10
Quantity : 50

There are fields such as -A, -B and -C
-A After Context
-B Before Context
-C Context

Regular Expression:

Regular expression are a sequence of patterns that define a search pattern. Typical use of regex example. Input validation, Search and replace, String parsing, Data Scraping, Syntax highlighing, Data mapping etc.

Anchors, Ranges, Boundaries and Validating Data:
^ Anchors
?

Example: grep ‘^server’ /etc/ntp.conf

grep '^server' /etc/ntp.conf
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst

Python Regular Expression Quick Guide

^        Matches the beginning of a line
$        Matches the end of the line
.        Matches any character
\s       Matches whitespace
\S       Matches any non-whitespace character
*        Repeats a character zero or more times
*?       Repeats a character zero or more times
         (non-greedy)
+        Repeats a character one or more times
+?       Repeats a character one or more times
         (non-greedy)
[aeiou]  Matches a single character in the listed set
[^XYZ]   Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
(        Indicates where string extraction is to start
)        Indicates where string extraction is to end

grep ‘4$’ /etc/logrotate.d/*  Find the lines where the file has 4 at the end of a line.

cat -vet /etc/ntp.conf –> display the non-printing. You would see $ as well.

to print the blank line.
grep ‘^$’ /etc/ntp.conf
To print all the lines without the blank lines, just switch the option -v (Invert the flags).

Boundaries in Regular Expression

\s Whitespace
\b Word boundar
\B Reverse the meaning.

Ranges in Regular Expression

Any letter [A-Za-z]
Any digit [0-9]
Any lowercase letter and underscore [a-z_]
Matches 3,4,9 [349]

Exercise:

Jones,Bob,232-78-3456
Jackeson,Jane,,
Federer,Jack,xxx-xx-xxxx
Maw,Michael,1879-0
Alexander,Sally,345-89-8095
Beder,Ioana,567-34-9802
Staines,Brad,,

Find the Employee Records with invalid SSN records.
grep -vE ‘\b[0-9]{3}-[0-9]{2}-[0-9]{4}\b’ employees.txt

Quantifiers:

? Matches 0 or 1 time
* Matches  0 or more times.
+ Matches 1 or more times.

Summary:

regex Description
ˆ Matches the beginning of the line.
$ Matches the end of the line.
. Matches any character (a wildcard).
\s Matches a whitespace character.
\S Matches a non-whitespace character (opposite of \s).
* Applies to the immediately preceding character and indicates to match zero or more of the preceding character.
*? Applies to the immediately preceding character and indicates to match zero or more of the preceding character in “non-greedy mode”.
+ Applies to the immediately preceding character and indicates to match zero or more of the preceding character.
+? Applies to the immediately preceding character and indicates to match zero or more of the preceding character in “non-greedy mode”.
[aeiou] Matches a single character as long as that character is in the specified set. In this example, it would match “a”, “e”, “i”, “o” or “u” but no other characters.

[a-z0-9] You can specify ranges of characters using the minus sign. This example is a
single character that must be a lower case letter or a digit.

[ˆA-Za-z]
When the first character in the set notation is a caret, it inverts the logic. This
example matches a single character that is anything other than an upper or lower
case character.
( )
When parentheses are added to a regular expression, they are ignored for the purpose
of matching, but allow you to extract a particular subset of the matched string
rather than the whole string when using findall().
\b
Matches the empty string, but only at the start or end of a word.
\B
Matches the empty string, but not at the start or end of a word.
\d
Matches any decimal digit; equivalent to the set [0-9].
\D
Matches any non-digit character; equivalent to the set [ˆ0-9].

Quick References:

Regular Expression Description
[abc] Matches either an a, b or c character
[^abc] Matches any character except for an a, b or c
[^a-z] Matches any characters except those in the range a-z.
[a-zA-Z] Matches any characters between a-z or A-Z. You can combine as much as you please.
. Matches any character other than newline (or including newline with the /s flag)
\s Matches any space, tab or newline character.
\S Matches anything other than a space, tab or newline.
\d Matches any decimal digit. Equivalent to [0-9].
\D Matches anything other than a decimal digit.

\w Matches any letter, digit or underscore. Equivalent to [a-zA-Z0-9_].
\W Matches anything other than a letter, digit or underscore.

AWK

awk -F “:” ‘{ print $1 }’ /etc/passwd

Different Fields in Apache Access Log:

Finding most Popular Browser in the Apache/Nginx access.log
awk ‘{print $12}’ access.log |sort |uniq -c
LogFormat of Apache access.log

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined

%h is the remote host (ie the client IP)
%l is the identity of the user determined by identd (not usually used since not reliable)
%u is the user name determined by HTTP authentication
%t is the time the request was received.
%r is the request line from the client. ("GET / HTTP/1.0")
%>s is the status code sent from the server to the client (200, 404 etc.)
%b is the size of the response to the client (in bytes)
Referer is the Referer header of the HTTP request (containing the URL of the page from which this request was initiated) if any is present, and "-" otherwise.
User-agent is the browser identification string.
Field Description
Field 1 Client IP

 

Leave a Reply
Your email address will not be published. Required fields are marked *

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: