Many times a lot of data would be stored in files and we may have to pick and change only relevant portions from a file. Even though there are string functions which allow us to manipulate strings, when dealing with more complicated requirements, we would need more powerful tools.
Regular Expressions are used to check and extract relevant portions of a string based on a pattern and modify if required.
Python has a module named 're' for regular expressions.
Two commonly used methods in the 're' module are search and sub. Search is used to find a pattern and sub is used to perform a substitution.
input_data = “Flight Savana Airlines a2134”
Here, r in front of the search pattern indicates 'raw string' where the special characters are treated as normal characters.
The output will be 'None' if the pattern is not found.
If you want to search the pattern “Air” in the given string “Airline” here are few example for that.
//Example 1: import re if(re.search(r"Air","Airline")!=None): print("Pattern found") else: print("Pattern not found") Output: Pattern found
//Example 2: import re if(re.search(r"Air","airline")!=None): print("Pattern found") else: print("Pattern not found") Output: Pattern not found
Go through metacharacters and the respective examples given below.
Metacharacter | Usage | Requirement | Solution | Remarks |
---|---|---|---|---|
. | Used to match one occurrence of any character | To search the pattern having two characters in between A and l in the given string "Aopline". | import re if(re.search(r"A..l","Aopline")!=None): print("Pattern found") else: print("Pattern not found") | “.” stands for any character. If any two characters are there between A and l, then the pattern has matched |
\d | Used to match one occurrence of any digit from 0-9 | To search for a digit between A and l in the given string "A2line". | import re if(re.search(r"A\dl","A2line")!=None): print("Pattern found") else: print("Pattern not found") | \d checks for a digit. If any digit is found between A and l, then the pattern has matched |
* | Used to match zero or more occurrences of previous character | To check if a number is found 0 or n times after A in the given string. | import re if(re.search(r"A\d*","A2234line")!=None): print("Pattern found") else: print("Pattern not found") | Checks if a number is found 0 or n times after A |
+ | Used to match one or more occurrences of previous character | To check if a number is found 1 or n times after A in the given string. | import re if(re.search(r"A\d+","Airline")!=None): print("Pattern found") else: print("Pattern not found") | Checks if a number is found 1 or n times after A |
? | Used to match zero or one occurrence of previous character | To check if a number is found 0 or 1 times after A in the given string. | import re if(re.search(r"A\d?i","Airline")!=None): print("Pattern found") else: print("Pattern not found") | Checks if a number is found 0 or 1 times after A |
{n} | Used to match exactly n occurrences of previous character | To check if 3 digits are present after A in the given string. | import re if(re.search(r"A\d{3}i","A223irline")!=None): print("Pattern found") else: print("Pattern not found") | {n} checks if the preceding character appears exactly n times. Here we are checking if there are 3 digits after A |
[] | Used to match one occurrence of any characters present within square brackets | To search for a number between 4 and 8 in between A and l in the given string. | import re if(re.search(r"A[4-8]l","A2line")!=None): print("Pattern found") else: print("Pattern not found") | [] does a single character substitution. We can specify a sequence of values. If any of the values are found, then the pattern has matched |
^ | Used to match a pattern at the beginning of a string | To check if the given string is starting with A. | import re if(re.search(r"^A","Airline")!=None): print("Pattern found") else: print("Pattern not found") | ^ checks if a pattern is at the beginning of the string. Here we check if string begins with “A” |
$ | Used to match a pattern at the end of a string | To check if the given string is ending with e. | import re if(re.search(r"e$","Airline")!=None): print("Pattern found") else: print("Pattern not found") | $ checks if a pattern is at the end of the string. Here we check if string ends with “e” |
\w | Used to match an word character which includes alphabets(a-zA-Z), digits(0-9) and ‘_ ’ | To check whether last character is alphanumeric or not. | import re if(re.search(r"\w$","Airline%")!=None): print("Pattern found") else: print("Pattern not found") | \w checks for a-z,A-Z,0-9,_ Here we check if the last character is an alphanumeric character |
\s | Used to match single space or sequence of spaces (including \t \n) | To check for the space after "Air" in the given string "Airline". | import re if(re.search(r"Air\s","Airline")!=None): print("Pattern found") else: print("Pattern not found") | \s indicates a space. Here we are checking if there is a space after Air |
| | Used to match any one pattern which is on either side of it | To search for the pattern "Hell" or "Fell" in the given string "Fellow". | import re if(re.search(r"Hell|Fell","Fellow")!=None): print("Pattern found") else: print("Pattern not found") | | acts like ‘or’ operator. If Hell or Fell is found in the string, the pattern is found |
We can use the function ‘sub’ to perform a substitution.
Go through the below example :
import re flight_details="Flight Savana Airlines a2134" print(flight_details) print(re.sub(r"Flight",r"Plane",flight_details)) Output: Flight Savana Airlines a2134 Plane Savana Airlines a2134
All Rights Reserved. © 2024 BookOfNetwork