Summarize data.table by Group in R Programming (Example Code)

This tutorial shows how to aggregate a data.table by group means.

Setting up the Example

Load the data.table package.

install.packages("data.table")                                                    # Install data.table package
library("data.table")                                                             # Load data.table package

Take the iris dataset as an example and transform it to a data.table, stored as iris_dt.

data(iris)                                                                        # Loading example data
iris_dt <- data.table(iris)
iris_dt
#      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#   1:          5.1         3.5          1.4         0.2    setosa
#   2:          4.9         3.0          1.4         0.2    setosa
#   3:          4.7         3.2          1.3         0.2    setosa
#   4:          4.6         3.1          1.5         0.2    setosa
#   5:          5.0         3.6          1.4         0.2    setosa
#  ---                                                            
# 146:          6.7         3.0          5.2         2.3 virginica
# 147:          6.3         2.5          5.0         1.9 virginica
# 148:          6.5         3.0          5.2         2.0 virginica
# 149:          6.2         3.4          5.4         2.3 virginica
# 150:          5.9         3.0          5.1         1.8 virginica

Example: Computing the Mean by Groups in a data.table

We aggregate the data such that it only contains the mean value of Sepal.Length for each unique value of column Species. The new column containing the group means is called Species_average.

iris_dt_new <- iris_dt[ , .(Species_average = mean(Sepal.Length)), by = Species]  # Calculating mean by group
iris_dt_new
#       Species Species_average
# 1:     setosa           5.006
# 2: versicolor           5.936
# 3:  virginica           6.588

The code above automatically reduces our data to the desired output dimensions.

Anna-Lena Wölwer R Programming & Survey Statistics

Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.

Summarize data.table by Group in R Programming (Example Code)

Setting up the Example

Example: Computing the Mean by Groups in a data.table

Leave a Reply Cancel reply

How to Apply If Else Statements in R Programming (2 Examples)

Exchange Elements of Factor Vector in R (2 Examples)

if-Statement Warning Message in R: Only First Element Will Be Used (2 Examples)