R Data Frame
Data Frame
- A Data Frame in R has two dimensional properties similar to a matrix but it can contain heterogeneous data
- In a way, data frame is like a list with components as columns
- The components of a data frame must be vectors (numeric, character or logical), factors, numeric matrices, lists or other data frames
- Vector structures appearing as variables of a data frame must all have the same length and matrix structures must all have the same row size
Syntax
//Syntax:
data.frame(vectors, row.names = NULL, etc…)
as.data.frame(x)
where x can be vector, list, factor or matrix
is.data.frame(x)
it checks if the variable is a data frame or not
Creating a data frame with name myScore by using data.frame() function
CODE/PROGRAM/EXAMPLE
Subjects <- c("Math","Science","English","Social")
Marks <- c(99,67,74,62)
myScore <- data.frame(Subjects, Marks)
myScore
Output:
Consider ‘mat1’ matrix. Use as.data.frame() function to convert it to a data frame with the name ‘NewDF’
CODE/PROGRAM/EXAMPLE
NewDF <- as.data.frame(mat1)
NewDF
Output:
Consider the two heterogeneous vectors, Subjects and Marks, with character and numeric types respectively and the myScore data frame created in the previous example
The ‘Subjects’ character vector got converted to a factor, when the data frame was created
To ensure that the ‘Subjects’ vector remains as a character, use option stringsAsFactors = FALSE
CODE/PROGRAM/EXAMPLE
data.frame(Subjects, Marks, stringsAsFactors = FALSE)
names() function
can be used to retrieve the column names
can be used to modify the column names
Consider the previous GMAT example. If we want to change the name from ‘Tom ‘ to ‘John’
CODE/PROGRAM/EXAMPLE
names(GMAT.df)
[1] "Jane" "Tom" "Katy" "James"
names(GMAT.df) <- c("Jane", "John", "Katy", "James")
GMAT.df
Output:
colnames() & rownames() function
can be used to retrieve or modify the column and row names respectively
CODE/PROGRAM/EXAMPLE
colnames(GMAT.df)
[1] "Jane" "John" "Katy" "James"
'$'symbol is required to access a specific column
CODE/PROGRAM/EXAMPLE
GMAT.df$Katy
[1] 99.4 99.7 98.9
an element can be retrieved with the help of its position in the data frame
CODE/PROGRAM/EXAMPLE
GMAT.df[2,3]
[1] 99.7
# 99.7 is math score of Katy
dim() function can be used to check the dimensions of the data frame and also to modify the dimensions of the same
CODE/PROGRAM/EXAMPLE
dim(GMAT.df)
[1] 3 4
One way to subset the data frame is by using ‘subset()’ function
Syntax
Syntax :
subset(x, condition, select, ..)
Where,
x: the data frame
condition: the subset condition
select: columns to be displayed in the output
Consider the ‘math’ data frame
If we want the details of students who scored more than or equal to 96 marks
CODE/PROGRAM/EXAMPLE
subset(math, math$marks >= 96)
Output:
If we want the names & marks of students who scored more than 96 and less than 99
subset(math, math$marks > 96 & math$marks < 99, select = c(name, marks))
Output:
What will happen if there are missing values in my data frames ?
Any operation performed on missing data(NA), will result in NA, but we have a option to resolve this issue. Let’ us see what it is.