Getting the Ranks per Group in R (Example Code)

In this tutorial, you’ll learn how to generate a variable containing the rank per group in the R programming language.

Preparing the Example

We use the data.table package.

install.packages(data.table)                                                           # Install & load csharp
library(data.table)

For the examples, we use the iris dataset.

data(iris)                                                                               # Load iris data set
head(iris)                                                                               # Print head of data
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

For the examples, we only extract some randomly chosen rows of the iris dataset and store them as a data.table object.

set.seed(642)
iris_dt <- as.data.table(iris[sample(1:nrow(iris), 20), c(2, 5)])                        # Generate new data.table object

Example: Creating Ranking Variables

In this first example, we create a ranking variable rank1 containing the ranks of the observations according to variable Sepal.Width.

iris_dt[, rank1 := frank(Sepal.Width, ties.method = "max")]                              # Creating a ranking variable for Sepal.Width
iris_dt[order(Sepal.Width), ]                                                            # Reordering the data
#     Sepal.Width    Species rank1
#  1:         2.0 versicolor     1
#  2:         2.3 versicolor     2
#  3:         2.5 versicolor     3
#  4:         2.6  virginica     4
#  5:         2.7 versicolor     6
#  6:         2.7  virginica     6
#  7:         2.8  virginica     7
#  8:         3.0 versicolor    13
#  9:         3.0     setosa    13
# 10:         3.0 versicolor    13
# 11:         3.0  virginica    13
# 12:         3.0  virginica    13
# 13:         3.0 versicolor    13
# 14:         3.1  virginica    15
# 15:         3.1 versicolor    15
# 16:         3.2 versicolor    16
# 17:         3.3  virginica    17
# 18:         3.8     setosa    20
# 19:         3.8  virginica    20
# 20:         3.8  virginica    20

In the previous code, we used option ties.method to determine what to do about values which have the same rank. In the example, we assign the highest of their rank vales to all single values. In the reordered data you can for example see that there is no rank 5 and instead two observations have rank 6.

Now, we generate two ranking variables containing grouping information.

iris_dt[, rank_by_Species := frank(iris_dt, Species, Sepal.Width, ties.method = "min")]  # Ranking considering Species
iris_dt[, rank_by_Species_2 := frank(Sepal.Width, ties.method = "min"), by = Species]    # Ranking per Species
iris_dt[order(Species, Sepal.Width), ] 
#     Sepal.Width    Species rank1 rank_by_Species rank_by_Species_2
#  1:         3.0     setosa    13               1                 1
#  2:         3.8     setosa    20               2                 2
#  3:         2.0 versicolor     1               3                 1
#  4:         2.3 versicolor     2               4                 2
#  5:         2.5 versicolor     3               5                 3
#  6:         2.7 versicolor     6               6                 4
#  7:         3.0 versicolor    13               7                 5
#  8:         3.0 versicolor    13               7                 5
#  9:         3.0 versicolor    13               7                 5
# 10:         3.1 versicolor    15              10                 8
# 11:         3.2 versicolor    16              11                 9
# 12:         2.6  virginica     4              12                 1
# 13:         2.7  virginica     6              13                 2
# 14:         2.8  virginica     7              14                 3
# 15:         3.0  virginica    13              15                 4
# 16:         3.0  virginica    13              15                 4
# 17:         3.1  virginica    15              17                 6
# 18:         3.3  virginica    17              18                 7
# 19:         3.8  virginica    20              19                 8
# 20:         3.8  virginica    20              19                 8

Variable rank_by_Species contains the ranks according to both Species and Sepal.Width. Variable rank_by_Species_2, on the other hand, contains the ranks according to Sepal.Width for each single class of Species.

Anna-Lena Wölwer R Programming & Survey Statistics

Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.

Getting the Ranks per Group in R (Example Code)

Preparing the Example

Example: Creating Ranking Variables

Leave a Reply Cancel reply

Plot Histogram with Multiple Different Colors in R (2 Examples)

Fix R Error – stat_count Must not be Used with a Y Aesthetic (2 Examples)

R Calculate Correlation Only for Numeric Variables of Data Frame (Example Code)