R Data Frame

Data Frame

  • A Data Frame in R has two dimensional properties similar to a matrix but it can contain heterogeneous data
  • In a way, data frame is like a list with components as columns
  • The components of a data frame must be vectors (numeric, character or logical), factors, numeric matrices, lists or other data frames
  • Vector structures appearing as variables of a data frame must all have the same length and matrix structures must all have the same row size
Syntax
//Syntax:
data.frame(vectors, row.names = NULL, etc…)

as.data.frame(x)
	where x can be vector, list, factor or matrix
	
is.data.frame(x)
	it checks if the variable is a data frame or not
	

Creating a data frame with name myScore by using data.frame() function

CODE/PROGRAM/EXAMPLE
Subjects <- c("Math","Science","English","Social")
	Marks <- c(99,67,74,62)
	myScore <- data.frame(Subjects, Marks)
	myScore

Output:

data.frame() output in r language

Consider ‘mat1’ matrix. Use as.data.frame() function to convert it to a data frame with the name ‘NewDF’

as.data.frame() 1 output in r language
CODE/PROGRAM/EXAMPLE
NewDF <- as.data.frame(mat1)
NewDF

Output:

as.data.frame() 2 output in r language

Consider the two heterogeneous vectors, Subjects and Marks, with character and numeric types respectively and the myScore data frame created in the previous example

as.data.frame() 3 output in r language

The ‘Subjects’ character vector got converted to a factor, when the data frame was created

vector got converted to a factor in r language

To ensure that the ‘Subjects’ vector remains as a character, use option stringsAsFactors = FALSE

CODE/PROGRAM/EXAMPLE
data.frame(Subjects, Marks, stringsAsFactors = FALSE)

names() function

can be used to retrieve the column names

can be used to modify the column names

Consider the previous GMAT example. If we want to change the name from ‘Tom ‘ to ‘John’

CODE/PROGRAM/EXAMPLE
names(GMAT.df)
	[1] "Jane" "Tom" "Katy" "James"
	names(GMAT.df) <- c("Jane", "John", "Katy", "James")
	GMAT.df

Output:

names() function output in r language

colnames() & rownames() function

can be used to retrieve or modify the column and row names respectively

CODE/PROGRAM/EXAMPLE
colnames(GMAT.df)
	[1] "Jane" "John" "Katy" "James"

'$'symbol is required to access a specific column

CODE/PROGRAM/EXAMPLE
GMAT.df$Katy
	[1] 99.4 99.7 98.9

dataFrameName[position]

an element can be retrieved with the help of its position in the data frame

CODE/PROGRAM/EXAMPLE
GMAT.df[2,3]
	[1] 99.7
	
        # 99.7 is math score of Katy

dim() function can be used to check the dimensions of the data frame and also to modify the dimensions of the same

CODE/PROGRAM/EXAMPLE
dim(GMAT.df)
	[1] 3 4

One way to subset the data frame is by using ‘subset()’ function

Syntax
Syntax :
	subset(x, condition, select, ..)
		
		Where,
			x: the data frame
			condition: the subset condition
			select: columns to be displayed in the output

Consider the ‘math’ data frame

subset() function 1 output in r language

If we want the details of students who scored more than or equal to 96 marks

CODE/PROGRAM/EXAMPLE
subset(math, math$marks >= 96)

Output:

subset() function 2 output in r language

If we want the names & marks of students who scored more than 96 and less than 99

subset(math, math$marks > 96 & math$marks < 99, select = c(name, marks))

Output:

subset() function 3 output in r language

What will happen if there are missing values in my data frames ?

Any operation performed on missing data(NA), will result in NA, but we have a option to resolve this issue. Let’ us see what it is.

#Data_Frame_in_r_language #r_programming_data_frame #dataframes_in_r_programming #r_language_data_frame #r_programming_filter_dataframe #convert_dataframe_to_array_r

(New page will open, for Comment)

Not yet commented...