R Regular Expressions

Regular Expressions

Analytics with statistical data using R involves a great deal of text data or character strings’ processing such as, adjusting exported variable names to the R variable name format, searching and modifying a pattern in a large text and changing categorical variable levels

Most commonly used functions for String searches are

  • grep()
  • gsub()

A Regular Expression is a pattern that describes a set of strings

Meta Characters

The meta characters in extended regular expressions are

Syntax
<strong>. \ | ( ) [ { ^ $ * + ?</strong>

To identify complex patterns, we have meta characters to represent

  • spaces
  • sets of literals
  • beginning and end of a line
Meta Characters in r language

Regular Expressions Examples

Few examples on Regular Expressions are shown below

Regular Expressions in r language

grep() :

grep() searches for matches to the argument pattern within each element of a character vector

Syntax
str(grep)

Output:

grep() function in r language

pattern : the pattern to identify
x : the object with the text data
rest are additional options for more detailed search

Example:

CODE/PROGRAM/EXAMPLE
txt1 <- c(".count", "..finan.ce", "cost", ".date", "..coin")
	grep(pattern = ".c", txt1)
	[1] 1 2 5

gsub() :

gsub() performs replacement of the first and all matches respectively

Syntax
str(gsub)

Output:

gsub() function in r language

pattern: the pattern to identify
replacement: the replacement for matched pattern
x: the object with the text data
rest are additional options for more detailed search

Example:

CODE/PROGRAM/EXAMPLE
txt1 <- c(".count", "..finan.ce", "cost", ".date", "..coin")
	gsub(pattern = ".c", replacement = "&c", txt1)
	[1] "&count" "..finan&ce" "cost" ".date" ".&coin"

Regular Expressions

Try to replace the $ symbol with period ‘.’ in the example below

Syntax
#Examples on Reg Ex
	s = "gsub$uses$regular$expressions"

If the following command is used, the result is the same expression, it does not get replaced. This is because gsub() interpreted the character ”$” as a regular expressions’ special character

CODE/PROGRAM/EXAMPLE
gsub(pattern = "$", replacement = ".", s)
	[1] "gsub$uses$regular$expressions."

To get the correct result, we must ensure gsub() interprets ”$” as a regular character. This can be done by preceding ”$” with a double backslash

CODE/PROGRAM/EXAMPLE
gsub(pattern = "\$", replacement = ".", s)
	[1] "gsub.uses.regular.expressions"
#Regular_Expressions_in_r_language #grep()_function_in_r_language #gsub()_function_in_r_language #Meta_Characters_in_r_language #Regular_Expressions_Examples_in_r_language

(New page will open, for Comment)

Not yet commented...