# Using Summary Statistics in a data.table in R (3 Examples)

In this tutorial, I’ll illustrate how to apply summary statistics like the mean or median inside a data.table object in R.

## Preparing the Examples

Install and load the package data.table.

```install.packages("data.table") # Install data.table package library("data.table") # Load data.table package```

For the examples, use the iris dataset.

```data(iris) # Load iris data set head(iris) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa```

Convert it into a data.table object, called iris_dt.

```iris_dt <- data.table::copy(iris) # Replicate iris data set setDT(iris_dt) # Convert iris to a data.table head(iris_dt) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1: 5.1 3.5 1.4 0.2 setosa # 2: 4.9 3.0 1.4 0.2 setosa # 3: 4.7 3.2 1.3 0.2 setosa # 4: 4.6 3.1 1.5 0.2 setosa # 5: 5.0 3.6 1.4 0.2 setosa # 6: 5.4 3.9 1.7 0.4 setosa```

## Example 1: Absolute Frequencies

We want to create frequency tables inside data.table. For an illustration of these, we define an additional categorical column Petal.L.Class, which corresponds to the quantiles of Petal.Length. Then, we display the frequency table of the categorical variables Petal.L.Class and Species.

```iris_dt_1 <- iris_dt[ , "Petal.L.Class" := cut(Petal.Length, quantile(Petal.Length))] iris_dt_2 <- iris_dt_1[ , table(.SD), .SDcols = c("Petal.L.Class", "Species")] iris_dt_2 # Species # Petal.L.Class setosa versicolor virginica # (1,1.6] 43 0 0 # (1.6,4.35] 6 25 0 # (4.35,5.1] 0 25 16 # (5.1,6.9] 0 0 34```

We see that in the data, the Petal.Length of the Species virginica is highest among the three species.

## Example 2: Summary Statistics for Chosen Columns

In R, function summary() returns the values of a set of summary statistics including chosen quantiles and the minimum and maximum value of a variable. With the following code, we can apply the function to chosen columns, here to Petal.Length and Petal.Width.

```iris_dt[ , summary(.SD), .SDcols = c("Petal.Length", "Petal.Width")] # Petal.Length Petal.Width # Min. :1.000 Min. :0.100 # 1st Qu.:1.600 1st Qu.:0.300 # Median :4.350 Median :1.300 # Mean :3.758 Mean :1.199 # 3rd Qu.:5.100 3rd Qu.:1.800 # Max. :6.900 Max. :2.500```

## Example 4: Display Different Statistics

We can also choose our own set of summary statistics for a variable, as demonstrated by the code below.

```iris_dt_4 <- iris_dt[, c("Mean" = mean(Petal.Length), # Calculate chosen statistics "Variance" = var(Petal.Length), "Median" = median(Petal.Length), "Minimum" = min(Petal.Length), "Maximum" = max(Petal.Length), "quantile_7" = quantile(Petal.Length, 0.75))] iris_dt_4 # Mean Variance Median Minimum Maximum # 3.758000 3.116278 4.350000 1.000000 6.900000 # quantile_7.75% # 5.100000```

## Example 4: Use Summary Statistics to Create new Column with Median Values

The next code line shows how to create a new column which includes the median value of Petal.Width for the three categories of Species.

```iris_dt2 <- iris_dt[ , "Petal.W.Median" := median(Petal.Width), by = Species] head(iris_dt2) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.W.Median # 1: 5.1 3.5 1.4 0.2 setosa 0.2 # 2: 4.9 3.0 1.4 0.2 setosa 0.2 # 3: 4.7 3.2 1.3 0.2 setosa 0.2 # 4: 4.6 3.1 1.5 0.2 setosa 0.2 # 5: 5.0 3.6 1.4 0.2 setosa 0.2 # 6: 5.4 3.9 1.7 0.4 setosa 0.2```

## Related Tutorials

Have a look at the following R programming tutorials. They illustrate topics such as groups and variables:

Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.