Join Data in R [dplyr Package]
This page shows how to merge data with the join functions of the dplyr package in the R programming language.
Example Data & Setup dplyr Package
First example data frame:
my_data_1 <- data.frame(ID = 1:4, # Create first example data frame X = letters[1:4], stringsAsFactors = FALSE) my_data_1 # ID X # 1 a # 2 b # 3 c # 4 d |
my_data_1 <- data.frame(ID = 1:4, # Create first example data frame X = letters[1:4], stringsAsFactors = FALSE) my_data_1 # ID X # 1 a # 2 b # 3 c # 4 d
Second example data frame with different IDs:
my_data_2 <- data.frame(ID = 3:6, # Create second example data frame Y = LETTERS[1:4], stringsAsFactors = FALSE) my_data_2 # ID Y # 3 A # 4 B # 5 C # 6 D |
my_data_2 <- data.frame(ID = 3:6, # Create second example data frame Y = LETTERS[1:4], stringsAsFactors = FALSE) my_data_2 # ID Y # 3 A # 4 B # 5 C # 6 D
Install and load dplyr package in R:
install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr package |
install.packages("dplyr") # Install dplyr package library("dplyr") # Load dplyr package
The dplyr package contains six different functions for the merging of data frames in R. Each of these functions is performing a different join, leading to a different number of merged rows and columns.
Have a look at the video at the bottom of this page, in case you want to learn more about the different types of joins in R.
Inner Join
inner_join(my_data_1, my_data_2) # Apply inner join # ID X Y # 3 c A # 4 d B |
inner_join(my_data_1, my_data_2) # Apply inner join # ID X Y # 3 c A # 4 d B
Left Join
left_join(my_data_1, my_data_2) # Apply left join # ID X Y # 1 a <NA> # 2 b <NA> # 3 c A # 4 d B |
left_join(my_data_1, my_data_2) # Apply left join # ID X Y # 1 a <NA> # 2 b <NA> # 3 c A # 4 d B
Right Join
right_join(my_data_1, my_data_2) # Apply right join # ID X Y # 3 c A # 4 d B # 5 <NA> C # 6 <NA> D |
right_join(my_data_1, my_data_2) # Apply right join # ID X Y # 3 c A # 4 d B # 5 <NA> C # 6 <NA> D
Full Join
full_join(my_data_1, my_data_2) # Apply full join # ID X Y # 1 a <NA> # 2 b <NA> # 3 c A # 4 d B # 5 <NA> C # 6 <NA> D |
full_join(my_data_1, my_data_2) # Apply full join # ID X Y # 1 a <NA> # 2 b <NA> # 3 c A # 4 d B # 5 <NA> C # 6 <NA> D
Semi Join
semi_join(my_data_1, my_data_2) # Apply semi join # ID X # 3 c # 4 d |
semi_join(my_data_1, my_data_2) # Apply semi join # ID X # 3 c # 4 d
Anti Join
anti_join(my_data_1, my_data_2) # Apply anti join # ID X # 1 a # 2 b |
anti_join(my_data_1, my_data_2) # Apply anti join # ID X # 1 a # 2 b