Applying Function to Multiple Columns of data.table in R (4 Examples)

In this R tutorial you’ll learn how to handle lapply with data.table objects.

Setting up the Examples

Start by installing and loading the data.table package. We also have an overview post of data.table here. You can find the GitLab repository of data.table here.

install.packages("data.table")                                                                                                                          # Install & load data.table
library("data.table")

Take the iris data as an example dataset.

data(iris)                                                                                                                                              # Loading iris data set
head(iris)                                                                                                                                              # Printing head of data
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

Copy the data and put it in the data.table format.

iris_DT <- data.table(data.table::copy(iris))                                                                                                           # Copying data as data.table

Example 1: Calculating the Sum Values of Multiple Variables

In this example, we want to use lapply to apply the sum() function to multiple data.table columns. We do this by use of .SD and .SDcols. .SD serves as a placeholder for each of the columns which we put in .SDcols.

iris_DT[ , lapply (.SD, sum), .SDcols = c("Sepal.Length", "Petal.Length")]                                                                              # Calculating sum values
#    Sepal.Length Petal.Length
# 1:        876.5        563.7

The previous line returns the sum of Sepal.Length and Petal.Length.

Example 2: Calculating the Sum Values of Multiple Variables by Groups

Now, we go one step further by calculating the sum of both variables for each category of column Species. For that, we simply add the “, by =” argument to the previous code as follows.

iris_DT[ , lapply (.SD, sum), by = .(Species), .SDcols = c("Sepal.Length", "Petal.Length")]                                                             # Calculating group sums 
#       Species Sepal.Length Petal.Length
# 1:     setosa        250.3         73.1
# 2: versicolor        296.8        213.0
# 3:  virginica        329.4        277.6

Example 3: Using Self-Defined Functions

Often, we additionally want to use further function arguments or use self-defined functions. We can easily add a self-defined function as follows.

iris_DT[ , lapply (.SD, function (x) { sum(sqrt(x) / 2) }), .SDcols = c("Sepal.Length", "Petal.Length")]                                                # Applying arbitrary function
#    Sepal.Length Petal.Length
# 1:     180.8488     140.5313

We calculated for each variable x the sum of half of the square root of its values.

Example 4: Defining New Columns

Furthermore, we can use lapply together with the definition of new columns, as shown in the following example.

iris_DT <- iris_DT[ , c("Sepal.Length_new", "Petal.Length_new") := lapply (.SD, function (x) { 4*x + 2 }), 
                    .SDcols = c("Sepal.Length", "Petal.Length")]  # Defining new variables
head(iris_DT)                                                                                                                                           # Printing data head
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_new
# 1:          5.1         3.5          1.4         0.2  setosa             22.4
# 2:          4.9         3.0          1.4         0.2  setosa             21.6
# 3:          4.7         3.2          1.3         0.2  setosa             20.8
# 4:          4.6         3.1          1.5         0.2  setosa             20.4
# 5:          5.0         3.6          1.4         0.2  setosa             22.0
# 6:          5.4         3.9          1.7         0.4  setosa             23.6
#    Petal.Length_new
# 1:              7.6
# 2:              7.6
# 3:              7.2
# 4:              8.0
# 5:              7.6
# 6:              8.8

For the new columns, we need to specify the column names, “:=” shows that we want to define the new columns by the function on the right-hand side.

 

Anna-Lena Wölwer R Programming & Survey Statistics

Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

Menu
Top