Control loops are preferred in programming languages like c, java, etc… but R has an efficient way of performing loops by using apply functions
R has multiple apply functions, for different purposes
Apply functions are an efficient way to perform iterations
Returns a vector or a list of values, obtained by applying a function to margins of an array or matrix Consider a matrix ‘score’
If we need to get the total score of each individual student in the class use sum() function on each column
list(sum(score[,1]), sum(score[,2]), sum(score[,3])) [[1]] [1] 414 [[2]] [1] 422 [[3]] [1] 421
Syntax :
apply (dataset/object, margin, function) Where, dataset: the object on which we perform the operations margin: this is either 1 or 2 (1 performs operation on rows and 2 performs operations on columns) function: the type of operation, both built-in and custom functions are valid options
Consider the matrix ‘score’ from previous example
To get the total score this time, use apply() function
apply(score, 2, sum)
Output :
apply(score, 1, sum)
Output :
lapply() is especially useful while dealing with lists & data frames. In R the data frame is considered as a list and variables in the data frame are the elements of the list. Therefore we can apply a function to all the variables in a data frame by using the lapply() function
lapply() provides the result column wise. Hence, its syntax will not have the margin parameter
Syntax:
lapply (dataset/object, function)
Convert the score matrix to a data frame and then perform the lapply() function
score.df <- as.data.frame(score) score.df
Output :
The output is displayed as a list object as shown
lapply(score.df, sum)
Output :
Note: apply() works on both rows and columns, but lappy() works only on columns
sapply() works similar to the lapply() function. When the argument simplify=F then the sapply() function returns the results in a list just like the lapply() function. However, when the argument simplify=T, the default, then the sapply() function returns the results in a simplified form, if at all possible.
Syntax:
sapply (dataset/object, function, simplify)
Consider the score data frame from the previous example and then perform the sapply() function
sapply(score.df, sum)
Output :
If the results are all scalars then sapply() returns a vector
If all the results are of the same length then, sapply() will return a matrix with a column for each element in the list, to which the function was applied
sapply() simplifies the result in to different objects depending on the type of the function. The example below illustrates the same
Consider the result of 4 students, who wrote multiple preliminary tests before the main exam. The data has been stored in the list format, as we have vectors of different length.
marks.list $a [1] 78 75 76 76 80 63 61 $b [1] 74 72 69 59 64 77 68 77 75 69 71 72 $c [1] 75 84 90 76 74 63 54 76 73 81 82 80 82 $d [1] 65 51 66 59 62 61 65 60
Scenario 1: To find the average marks of each student
avg <- sapply(marks.list, mean) print(avg) a b c d 72.71429 70.58333 76.57143 61.12500 is.vector(avg) [1] TRUE //Output is in the form of a Vector
Scenario 2: To find the range of each student
range <- sapply(marks.list, range) range a b c d [1,] 61 59 54 51 [2,] 80 77 90 66 is.matrix(range) [1] TRUE //Output is in the form of a Matrix
Scenario 3: To find the marks of the students whose marks are less than 65 marks, using sapply()
Create a function to get values less than 65. Invoke this function when performing sapply()
lt65 <- function(x) { return(x[x<65]) } less65 <- sapply(marks.list, lt65) less65 $a [1] 63 61 $b [1] 59 64 $c [1] 63 54 $d [1] 51 59 62 61 60 is.list(less65) [1] TRUE //Output is in the form of a List
tapply() is applied to each of the cells which are defined by the categorical variables listed in the argument indices
Syntax:
tapply (column A, column B, function) Where, column A: the column on which the operation has to be performed column B: the column on which it has to be “categorized” function: the type of the operation
Consider a data frame ‘math’ with name, section and marks as columns
To know the aggregate marks in each section, tapply() can be used
tapply(math$marks, math$section, sum) a b 290 289
Section ‘a’ got the highest marks in math
Consider ‘iris’ dataset from the dataset package
‘Iris’ has data of 50 flowers from 3 different species of iris
To get the mean of each species, use tapply() function
tapply(iris$Sepal.Length, iris$Species, mean) setosa versicolor virginica 5.006 5.936 6.588 tapply(iris$Sepal.Width, iris$Species, mean) setosa versicolor virginica 3.428 2.770 2.974 tapply(iris$Petal.Length, iris$Species, mean) setosa versicolor virginica 1.462 4.260 5.552 tapply(iris$Petal.Width, iris$Species, mean) setosa versicolor virginica 0.246 1.326 2.026
Note: The by() function works similar to tapply() function
by() is an object-oriented wrapper for tapply(), applied to data frames
Consider the ‘iris’ dataset again. It gives the measurements in centimeters, of the variables associated with sepal length, sepal width, petal length and petal width for 50 flowers, from each of the 3 species of iris
If we need to get the mean of each column, as per the species column, we can use by() function
by(iris[,1:4], iris$Species, colMeans)
Output :
mapply() is a multivariate version of sapply(). mapply() applies the function to the first elements of each argument, the second elements, the third elements and so on. Arguments are recycled if necessary
Syntax:
mapply (function, arg_1, arg_2,…) Where, function: the type of operation args: the data that needs to be processed
If we want data in the format shown below, we can use mapply() function
mapply(rep, 1:4, 4:1) repVals <- list(rep(1,4), rep(2,3), rep(3,2), rep(4,1)) repVals [[1]] [1] 1 1 1 1 [[2]] [1] 2 2 2 [[3]] [1] 3 3 [[4]] [1] 4
Another mapply() example is shown below
Consider a custom function ‘noise’, which generates a random number depending on mean and standard deviation
noise <- function(n, mean, sd) { rnorm(n, mean, sd) } noise(2, 3, 1) [1] 0.950255 1.217040
If we use the noise function with simultaneously varying inputs as its arguments, as shown below, the result obtained is not desirable
noise(1:5, 1:5, 2) [1] -0.2760307 1.3783007 3.0931290 5.7079372 5.1899422
Output comprises of one random normal with mean 1, two random normals with mean 2 and so on
To generate a desirable output we can make use of list() or use the mapply() function
#With List list(noise(1,1,2), noise(2,2,2), noise(3,3,2), noise(4,4,2), noise(5,5,2))
Output :
#With mapply() mapply(noise, 1:5, 1:5, 2)
Output :
All Rights Reserved. © 2024 BookOfNetwork