Joining data.tables in R (6 Examples)
In this article, you’ll learn how to join multiple data.tables in R in the R programming language.
Preparing the Examples
Install and load the data.table package.
install.packages("data.table") # Install & load data.table package library("data.table") |
install.packages("data.table") # Install & load data.table package library("data.table")
Load the iris dataset. We will use it for illustration.
data(iris) # Load iris data set head(iris) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa |
data(iris) # Load iris data set head(iris) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa
As the iris data is a data.frame, we transform it into a data.table object.
class(iris) # Object class of the data # [1] "data.frame" <pre lang="php">iris_dt <- data.table::copy(iris) # Replicating the iris data set setDT(iris_dt) # Converting the iris data to a data.table class(iris_dt) # Printing object class of the data # [1] "data.table" "data.frame" |
class(iris) # Object class of the data # [1] "data.frame" <pre lang="php">iris_dt <- data.table::copy(iris) # Replicating the iris data set setDT(iris_dt) # Converting the iris data to a data.table class(iris_dt) # Printing object class of the data # [1] "data.table" "data.frame"
We add a new column called ID to the data as an identifier of the data rows.
iris_dt[, ID := 1:nrow(iris_dt)] |
iris_dt[, ID := 1:nrow(iris_dt)]
From the iris data we construct three subsets iris_dt_1, iris_dt_2, and iris_dt_3, which we use for the merging examples in the following.
iris_dt_1 <- iris_dt[1:4, c(1,3,5,6)] # Creating a subset of the data iris_dt_1 # Printing the data # Sepal.Length Petal.Length Species ID # 1: 5.1 1.4 setosa 1 # 2: 4.9 1.4 setosa 2 # 3: 4.7 1.3 setosa 3 # 4: 4.6 1.5 setosa 4 |
iris_dt_1 <- iris_dt[1:4, c(1,3,5,6)] # Creating a subset of the data iris_dt_1 # Printing the data # Sepal.Length Petal.Length Species ID # 1: 5.1 1.4 setosa 1 # 2: 4.9 1.4 setosa 2 # 3: 4.7 1.3 setosa 3 # 4: 4.6 1.5 setosa 4
iris_dt_2 <- iris_dt[2:5, c(2,6)] # Creating a subset of the data iris_dt_2 # Printing the data # Sepal.Width ID # 1: 3.0 2 # 2: 3.2 3 # 3: 3.1 4 # 4: 3.6 5 |
iris_dt_2 <- iris_dt[2:5, c(2,6)] # Creating a subset of the data iris_dt_2 # Printing the data # Sepal.Width ID # 1: 3.0 2 # 2: 3.2 3 # 3: 3.1 4 # 4: 3.6 5
iris_dt_3 <- iris_dt[c(2,6,7), c(4,6)] # Creating a subset of the data iris_dt_3 # Printing the data # Petal.Width ID # 1: 0.2 2 # 2: 0.4 6 # 3: 0.3 7 |
iris_dt_3 <- iris_dt[c(2,6,7), c(4,6)] # Creating a subset of the data iris_dt_3 # Printing the data # Petal.Width ID # 1: 0.2 2 # 2: 0.4 6 # 3: 0.3 7
Example 1: Merging two data.tables with a Full Join
We begin with a full merge. That is, we want to combine all information of two data.tables. For that, it is necessary that both data.tables share at least one column, in this case ID.
iris_merge_full <- merge(iris_dt_1, iris_dt_2, all = TRUE) # Full join iris_merge_full # Print data.table # Sepal.Length ID Petal.Length Species Sepal.Width Petal.Width # 1: 4.6 4 1.5 setosa 3.1 0.2 # 2: 4.7 3 1.3 setosa 3.2 0.2 # 3: 4.9 2 1.4 setosa 3.0 0.2 # 4: 5.0 5 NA <NA> 3.6 0.2 # 5: 5.1 1 1.4 setosa NA NA |
iris_merge_full <- merge(iris_dt_1, iris_dt_2, all = TRUE) # Full join iris_merge_full # Print data.table # Sepal.Length ID Petal.Length Species Sepal.Width Petal.Width # 1: 4.6 4 1.5 setosa 3.1 0.2 # 2: 4.7 3 1.3 setosa 3.2 0.2 # 3: 4.9 2 1.4 setosa 3.0 0.2 # 4: 5.0 5 NA <NA> 3.6 0.2 # 5: 5.1 1 1.4 setosa NA NA
You see that all missing information is set to NA.
Example 2: Merging two data.tables with a Right Outer Join
Now, we want to make an outer right join. That, is, we take the dataset in the second argument of the merge() function and join to it the information of the data.table in the first argument.
iris_merge_out_right <- merge(iris_dt_1, iris_dt_2, all.y = TRUE) # Outer join: Right iris_merge_out_right # Print data.table # Sepal.Length ID Petal.Length Species Sepal.Width Petal.Width # 1: 4.6 4 1.5 setosa 3.1 0.2 # 2: 4.7 3 1.3 setosa 3.2 0.2 # 3: 4.9 2 1.4 setosa 3.0 0.2 # 4: 5.0 5 NA <NA> 3.6 0.2 |
iris_merge_out_right <- merge(iris_dt_1, iris_dt_2, all.y = TRUE) # Outer join: Right iris_merge_out_right # Print data.table # Sepal.Length ID Petal.Length Species Sepal.Width Petal.Width # 1: 4.6 4 1.5 setosa 3.1 0.2 # 2: 4.7 3 1.3 setosa 3.2 0.2 # 3: 4.9 2 1.4 setosa 3.0 0.2 # 4: 5.0 5 NA <NA> 3.6 0.2
You see that the joined data only contain the data rows of iris_dt_2, with additional information from the columns of iris_dt_1.
Example 3: Merging two data.tables with a Left Outer Join
In a left outer join, we do it the other way around. We take the data.table in the first argument of the merge() function and add to it the information of the data.table in the second argument.
iris_merge_out_left <- merge(iris_dt_1, iris_dt_2, all.x = TRUE) # Outer join: Left iris_merge_out_left # Print data.table # Sepal.Length ID Petal.Length Species Sepal.Width Petal.Width # 1: 4.6 4 1.5 setosa 3.1 0.2 # 2: 4.7 3 1.3 setosa 3.2 0.2 # 3: 4.9 2 1.4 setosa 3.0 0.2 # 4: 5.1 1 1.4 setosa NA NA |
iris_merge_out_left <- merge(iris_dt_1, iris_dt_2, all.x = TRUE) # Outer join: Left iris_merge_out_left # Print data.table # Sepal.Length ID Petal.Length Species Sepal.Width Petal.Width # 1: 4.6 4 1.5 setosa 3.1 0.2 # 2: 4.7 3 1.3 setosa 3.2 0.2 # 3: 4.9 2 1.4 setosa 3.0 0.2 # 4: 5.1 1 1.4 setosa NA NA
Example 4: Merging two data.tables with an Inner Join
In an inner join of two data.tables, we focus only on the data rows common to both data.tables and join the information for them.
iris_merge_inner <- merge(iris_dt_1, iris_dt_2, all = FALSE) # Inner join iris_merge_inner # Print data.table # Sepal.Length ID Petal.Length Species Sepal.Width Petal.Width # 1: 4.6 4 1.5 setosa 3.1 0.2 # 2: 4.7 3 1.3 setosa 3.2 0.2 # 3: 4.9 2 1.4 setosa 3.0 0.2 |
iris_merge_inner <- merge(iris_dt_1, iris_dt_2, all = FALSE) # Inner join iris_merge_inner # Print data.table # Sepal.Length ID Petal.Length Species Sepal.Width Petal.Width # 1: 4.6 4 1.5 setosa 3.1 0.2 # 2: 4.7 3 1.3 setosa 3.2 0.2 # 3: 4.9 2 1.4 setosa 3.0 0.2
Example 5: Merging three data.tables with an Inner Join
We can extend the code above to more than two data.tables. Let us do an inner merge of three data.tables.
iris_merge3_inner <- Reduce(function (...) { merge(..., all = FALSE) }, # Inner join list(iris_dt_1, iris_dt_2, iris_dt_3)) iris_merge3_inner # Print data.table # ID Sepal.Length Petal.Length Species Sepal.Width Petal.Width # 1: 2 4.9 1.4 setosa 3 0.2 |
iris_merge3_inner <- Reduce(function (...) { merge(..., all = FALSE) }, # Inner join list(iris_dt_1, iris_dt_2, iris_dt_3)) iris_merge3_inner # Print data.table # ID Sepal.Length Petal.Length Species Sepal.Width Petal.Width # 1: 2 4.9 1.4 setosa 3 0.2
Example 6: Merging three data.tables with a Full Join
Finally, we make a full join of three data.tables.
iiris_merge3_full <- Reduce(function (...) { merge(..., all = TRUE) }, # Full join list(iris_dt_1, iris_dt_2, iris_dt_3)) iris_merge3_full # Print data.table # ID Sepal.Length Petal.Length Species Sepal.Width Petal.Width # 1: 1 5.1 1.4 setosa NA NA # 2: 2 4.9 1.4 setosa 3.0 0.2 # 3: 3 4.7 1.3 setosa 3.2 NA # 4: 4 4.6 1.5 setosa 3.1 NA # 5: 5 NA NA <NA> 3.6 NA # 6: 6 NA NA <NA> NA 0.4 # 7: 7 NA NA <NA> NA 0.3 |
iiris_merge3_full <- Reduce(function (...) { merge(..., all = TRUE) }, # Full join list(iris_dt_1, iris_dt_2, iris_dt_3)) iris_merge3_full # Print data.table # ID Sepal.Length Petal.Length Species Sepal.Width Petal.Width # 1: 1 5.1 1.4 setosa NA NA # 2: 2 4.9 1.4 setosa 3.0 0.2 # 3: 3 4.7 1.3 setosa 3.2 NA # 4: 4 4.6 1.5 setosa 3.1 NA # 5: 5 NA NA <NA> 3.6 NA # 6: 6 NA NA <NA> NA 0.4 # 7: 7 NA NA <NA> NA 0.3
Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.