How to Get Descriptive Statistics by Group in R (Example Code)
In this article you’ll learn how to get summary statistics for each group of a data frame in the R programming language.
Creation of Example Data
data(iris) # Iris flower data set head(iris) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa |
data(iris) # Iris flower data set head(iris) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa
Example: Using tapply() Function to Compute Summary Statistics by Group
tapply(iris$Sepal.Length, # Descriptive statistics by group iris$Species, summary) # $setosa # Min. 1st Qu. Median Mean 3rd Qu. Max. # 4.300 4.800 5.000 5.006 5.200 5.800 # # $versicolor # Min. 1st Qu. Median Mean 3rd Qu. Max. # 4.900 5.600 5.900 5.936 6.300 7.000 # # $virginica # Min. 1st Qu. Median Mean 3rd Qu. Max. # 4.900 6.225 6.500 6.588 6.900 7.900 |
tapply(iris$Sepal.Length, # Descriptive statistics by group iris$Species, summary) # $setosa # Min. 1st Qu. Median Mean 3rd Qu. Max. # 4.300 4.800 5.000 5.006 5.200 5.800 # # $versicolor # Min. 1st Qu. Median Mean 3rd Qu. Max. # 4.900 5.600 5.900 5.936 6.300 7.000 # # $virginica # Min. 1st Qu. Median Mean 3rd Qu. Max. # 4.900 6.225 6.500 6.588 6.900 7.900