Summarize data.table by Group in R Programming (Example Code)

This tutorial shows how to aggregate a data.table by group means.

Setting up the Example

Load the data.table package.

install.packages("data.table")                                                    # Install data.table package
library("data.table")                                                             # Load data.table package

Take the iris dataset as an example and transform it to a data.table, stored as iris_dt.

data(iris)                                                                        # Loading example data
iris_dt <- data.table(iris)
iris_dt
#      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#   1:          5.1         3.5          1.4         0.2    setosa
#   2:          4.9         3.0          1.4         0.2    setosa
#   3:          4.7         3.2          1.3         0.2    setosa
#   4:          4.6         3.1          1.5         0.2    setosa
#   5:          5.0         3.6          1.4         0.2    setosa
#  ---                                                            
# 146:          6.7         3.0          5.2         2.3 virginica
# 147:          6.3         2.5          5.0         1.9 virginica
# 148:          6.5         3.0          5.2         2.0 virginica
# 149:          6.2         3.4          5.4         2.3 virginica
# 150:          5.9         3.0          5.1         1.8 virginica

Example: Computing the Mean by Groups in a data.table

We aggregate the data such that it only contains the mean value of Sepal.Length for each unique value of column Species. The new column containing the group means is called Species_average.

iris_dt_new <- iris_dt[ , .(Species_average = mean(Sepal.Length)), by = Species]  # Calculating mean by group
iris_dt_new
#       Species Species_average
# 1:     setosa           5.006
# 2: versicolor           5.936
# 3:  virginica           6.588

The code above automatically reduces our data to the desired output dimensions.

 

Anna-Lena Wölwer R Programming & Survey Statistics

Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

Menu
Top