R Regular Expressions
Regular Expressions
Analytics with statistical data using R involves a great deal of text data or character strings’ processing such as, adjusting exported variable names to the R variable name format, searching and modifying a pattern in a large text and changing categorical variable levels
Most commonly used functions for String searches are
A Regular Expression is a pattern that describes a set of strings
Meta Characters
The meta characters in extended regular expressions are
Syntax
<strong>. \ | ( ) [ { ^ $ * + ?</strong>
To identify complex patterns, we have meta characters to represent
- beginning and end of a line
Regular Expressions Examples
Few examples on Regular Expressions are shown below
grep() searches for matches to the argument pattern within each element of a character vector
Output:
pattern : the pattern to identify
x : the object with the text data
rest are additional options for more detailed search
Example:
CODE/PROGRAM/EXAMPLE
txt1 <- c(".count", "..finan.ce", "cost", ".date", "..coin")
grep(pattern = ".c", txt1)
[1] 1 2 5
gsub() performs replacement of the first and all matches respectively
Output:
pattern: the pattern to identify
replacement: the replacement for matched pattern
x: the object with the text data
rest are additional options for more detailed search
Example:
CODE/PROGRAM/EXAMPLE
txt1 <- c(".count", "..finan.ce", "cost", ".date", "..coin")
gsub(pattern = ".c", replacement = "&c", txt1)
[1] "&count" "..finan&ce" "cost" ".date" ".&coin"
Regular Expressions
Try to replace the $ symbol with period ‘.’ in the example below
Syntax
#Examples on Reg Ex
s = "gsub$uses$regular$expressions"
If the following command is used, the result is the same expression, it does not get replaced. This is because gsub() interpreted the character ”$” as a regular expressions’ special character
CODE/PROGRAM/EXAMPLE
gsub(pattern = "$", replacement = ".", s)
[1] "gsub$uses$regular$expressions."
To get the correct result, we must ensure gsub() interprets ”$” as a regular character. This can be done by preceding ”$” with a double backslash
CODE/PROGRAM/EXAMPLE
gsub(pattern = "\$", replacement = ".", s)
[1] "gsub.uses.regular.expressions"