Python - Regular Expressions

Python Regular Expressions

Regular Expressions

Many times a lot of data would be stored in files and we may have to pick and change only relevant portions from a file. Even though there are string functions which allow us to manipulate strings, when dealing with more complicated requirements, we would need more powerful tools.

Regular Expressions are used to check and extract relevant portions of a string based on a pattern and modify if required.

Python has a module named 're' for regular expressions.

Working with 're' module

Two commonly used methods in the 're' module are search and sub. Search is used to find a pattern and sub is used to perform a substitution.

Syntax

input_data = “Flight Savana Airlines a2134”

Here, r in front of the search pattern indicates 'raw string' where the special characters are treated as normal characters.

The output will be 'None' if the pattern is not found.

If you want to search the pattern “Air” in the given string “Airline” here are few example for that.

CODE/PROGRAM/EXAMPLE

//Example 1:
import re
	if(re.search(r"Air","Airline")!=None):
		print("Pattern found")
	else:
		print("Pattern not found")
		
Output:
	Pattern found

CODE/PROGRAM/EXAMPLE

//Example 2:
	import re
	if(re.search(r"Air","airline")!=None):
		print("Pattern found")
	else:
		print("Pattern not found")
		
Output:
	Pattern not found

More on regular expressions

Go through metacharacters and the respective examples given below.

Metacharacter	Usage	Requirement	Solution	Remarks
.	Used to match one occurrence of any character	To search the pattern having two characters in between A and l in the given string "Aopline".	import re if(re.search(r"A..l","Aopline")!=None): print("Pattern found") else: print("Pattern not found")	“.” stands for any character. If any two characters are there between A and l, then the pattern has matched
\d	Used to match one occurrence of any digit from 0-9	To search for a digit between A and l in the given string "A2line".	import re if(re.search(r"A\dl","A2line")!=None): print("Pattern found") else: print("Pattern not found")	\d checks for a digit. If any digit is found between A and l, then the pattern has matched
*	Used to match zero or more occurrences of previous character	To check if a number is found 0 or n times after A in the given string.	import re if(re.search(r"A\d*","A2234line")!=None): print("Pattern found") else: print("Pattern not found")	Checks if a number is found 0 or n times after A
+	Used to match one or more occurrences of previous character	To check if a number is found 1 or n times after A in the given string.	import re if(re.search(r"A\d+","Airline")!=None): print("Pattern found") else: print("Pattern not found")	Checks if a number is found 1 or n times after A
?	Used to match zero or one occurrence of previous character	To check if a number is found 0 or 1 times after A in the given string.	import re if(re.search(r"A\d?i","Airline")!=None): print("Pattern found") else: print("Pattern not found")	Checks if a number is found 0 or 1 times after A
{n}	Used to match exactly n occurrences of previous character	To check if 3 digits are present after A in the given string.	import re if(re.search(r"A\d{3}i","A223irline")!=None): print("Pattern found") else: print("Pattern not found")	{n} checks if the preceding character appears exactly n times. Here we are checking if there are 3 digits after A
[]	Used to match one occurrence of any characters present within square brackets	To search for a number between 4 and 8 in between A and l in the given string.	import re if(re.search(r"A[4-8]l","A2line")!=None): print("Pattern found") else: print("Pattern not found")	[] does a single character substitution. We can specify a sequence of values. If any of the values are found, then the pattern has matched
^	Used to match a pattern at the beginning of a string	To check if the given string is starting with A.	import re if(re.search(r"^A","Airline")!=None): print("Pattern found") else: print("Pattern not found")	^ checks if a pattern is at the beginning of the string. Here we check if string begins with “A”
$	Used to match a pattern at the end of a string	To check if the given string is ending with e.	import re if(re.search(r"e$","Airline")!=None): print("Pattern found") else: print("Pattern not found")	$ checks if a pattern is at the end of the string. Here we check if string ends with “e”
\w	Used to match an word character which includes alphabets(a-zA-Z), digits(0-9) and ‘_ ’	To check whether last character is alphanumeric or not.	import re if(re.search(r"\w$","Airline%")!=None): print("Pattern found") else: print("Pattern not found")	\w checks for a-z,A-Z,0-9,_ Here we check if the last character is an alphanumeric character
\s	Used to match single space or sequence of spaces (including \t \n)	To check for the space after "Air" in the given string "Airline".	import re if(re.search(r"Air\s","Airline")!=None): print("Pattern found") else: print("Pattern not found")	\s indicates a space. Here we are checking if there is a space after Air
\|	Used to match any one pattern which is on either side of it	To search for the pattern "Hell" or "Fell" in the given string "Fellow".	import re if(re.search(r"Hell\|Fell","Fellow")!=None): print("Pattern found") else: print("Pattern not found")	\| acts like ‘or’ operator. If Hell or Fell is found in the string, the pattern is found

re - Replacing data

We can use the function ‘sub’ to perform a substitution.

Go through the below example :

CODE/PROGRAM/EXAMPLE

import re
	flight_details="Flight Savana Airlines a2134"
	print(flight_details)
	print(re.sub(r"Flight",r"Plane",flight_details))

Output:
Flight Savana Airlines a2134
Plane Savana Airlines a2134

#python_regular_expression #Regular_Expressions_Python #python_regex #python_re_match #python_regex_example #python_import_re #python_regular_expression_example #python_regex_tutorial

(New page will open, for Comment)

Not yet commented...