Joining data.tables in R (6 Examples)

In this article, you’ll learn how to join multiple data.tables in R in the R programming language.

Preparing the Examples

Install and load the data.table package.

install.packages("data.table")                                     # Install & load data.table package
library("data.table")

Load the iris dataset. We will use it for illustration.

data(iris)                                                         # Load iris data set
head(iris)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

As the iris data is a data.frame, we transform it into a data.table object.

class(iris)                                                        # Object class of the data
# [1] "data.frame"
<pre lang="php">iris_dt <- data.table::copy(iris)                                  # Replicating the iris data set
setDT(iris_dt)                                                     # Converting the iris data to a data.table
class(iris_dt)                                                     # Printing object class of the data
# [1] "data.table" "data.frame"

We add a new column called ID to the data as an identifier of the data rows.

iris_dt[, ID := 1:nrow(iris_dt)]

From the iris data we construct three subsets iris_dt_1, iris_dt_2, and iris_dt_3, which we use for the merging examples in the following.

iris_dt_1 <- iris_dt[1:4, c(1,3,5,6)]                              # Creating a subset of the data
iris_dt_1                                                          # Printing the data
#    Sepal.Length Petal.Length Species ID
# 1:          5.1          1.4  setosa  1
# 2:          4.9          1.4  setosa  2
# 3:          4.7          1.3  setosa  3
# 4:          4.6          1.5  setosa  4
iris_dt_2 <- iris_dt[2:5, c(2,6)]                                  # Creating a subset of the data
iris_dt_2                                                          # Printing the data
#    Sepal.Width ID
# 1:         3.0  2
# 2:         3.2  3
# 3:         3.1  4
# 4:         3.6  5
iris_dt_3 <- iris_dt[c(2,6,7), c(4,6)]                             # Creating a subset of the data
iris_dt_3                                                          # Printing the data
#    Petal.Width ID
# 1:         0.2  2
# 2:         0.4  6
# 3:         0.3  7

Example 1: Merging two data.tables with a Full Join

We begin with a full merge. That is, we want to combine all information of two data.tables. For that, it is necessary that both data.tables share at least one column, in this case ID.

iris_merge_full <- merge(iris_dt_1, iris_dt_2, all = TRUE)         # Full join
iris_merge_full                                                    # Print data.table
#    Sepal.Length ID Petal.Length Species Sepal.Width Petal.Width
# 1:          4.6  4          1.5  setosa         3.1         0.2
# 2:          4.7  3          1.3  setosa         3.2         0.2
# 3:          4.9  2          1.4  setosa         3.0         0.2
# 4:          5.0  5           NA    <NA>         3.6         0.2
# 5:          5.1  1          1.4  setosa          NA          NA

You see that all missing information is set to NA.

Example 2: Merging two data.tables with a Right Outer Join

Now, we want to make an outer right join. That, is, we take the dataset in the second argument of the merge() function and join to it the information of the data.table in the first argument.

iris_merge_out_right <- merge(iris_dt_1, iris_dt_2, all.y = TRUE)  # Outer join: Right
iris_merge_out_right                                               # Print data.table
#    Sepal.Length ID Petal.Length Species Sepal.Width Petal.Width
# 1:          4.6  4          1.5  setosa         3.1         0.2
# 2:          4.7  3          1.3  setosa         3.2         0.2
# 3:          4.9  2          1.4  setosa         3.0         0.2
# 4:          5.0  5           NA    <NA>         3.6         0.2

You see that the joined data only contain the data rows of iris_dt_2, with additional information from the columns of iris_dt_1.

Example 3: Merging two data.tables with a Left Outer Join

In a left outer join, we do it the other way around. We take the data.table in the first argument of the merge() function and add to it the information of the data.table in the second argument.

iris_merge_out_left <- merge(iris_dt_1, iris_dt_2, all.x = TRUE)   # Outer join: Left
iris_merge_out_left                                                # Print data.table
#    Sepal.Length ID Petal.Length Species Sepal.Width Petal.Width
# 1:          4.6  4          1.5  setosa         3.1         0.2
# 2:          4.7  3          1.3  setosa         3.2         0.2
# 3:          4.9  2          1.4  setosa         3.0         0.2
# 4:          5.1  1          1.4  setosa          NA          NA

Example 4: Merging two data.tables with an Inner Join

In an inner join of two data.tables, we focus only on the data rows common to both data.tables and join the information for them.

iris_merge_inner <- merge(iris_dt_1, iris_dt_2, all = FALSE)       # Inner join
iris_merge_inner                                                   # Print data.table
#    Sepal.Length ID Petal.Length Species Sepal.Width Petal.Width
# 1:          4.6  4          1.5  setosa         3.1         0.2
# 2:          4.7  3          1.3  setosa         3.2         0.2
# 3:          4.9  2          1.4  setosa         3.0         0.2

Example 5: Merging three data.tables with an Inner Join

We can extend the code above to more than two data.tables. Let us do an inner merge of three data.tables.

iris_merge3_inner <- Reduce(function (...) { merge(..., all = FALSE) }, # Inner join
                            list(iris_dt_1, iris_dt_2, iris_dt_3))
iris_merge3_inner                                                       # Print data.table
#    ID Sepal.Length Petal.Length Species Sepal.Width Petal.Width
# 1:  2          4.9          1.4  setosa           3         0.2

Example 6: Merging three data.tables with a Full Join

Finally, we make a full join of three data.tables.

iiris_merge3_full <- Reduce(function (...) { merge(..., all = TRUE) },  # Full join
                           list(iris_dt_1, iris_dt_2, iris_dt_3))
iris_merge3_full                                                        # Print data.table
#    ID Sepal.Length Petal.Length Species Sepal.Width Petal.Width
# 1:  1          5.1          1.4  setosa          NA          NA
# 2:  2          4.9          1.4  setosa         3.0         0.2
# 3:  3          4.7          1.3  setosa         3.2          NA
# 4:  4          4.6          1.5  setosa         3.1          NA
# 5:  5           NA           NA    <NA>         3.6          NA
# 6:  6           NA           NA    <NA>          NA         0.4
# 7:  7           NA           NA    <NA>          NA         0.3

 

Anna-Lena Wölwer R Programming & Survey Statistics

Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.

Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

Menu
Top