Getting the Ranks per Group in R (Example Code)

In this tutorial, you’ll learn how to generate a variable containing the rank per group in the R programming language.

Preparing the Example

We use the data.table package.

install.packages(data.table)                                                           # Install & load csharp
library(data.table)

For the examples, we use the iris dataset.

data(iris)                                                                               # Load iris data set
head(iris)                                                                               # Print head of data
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

For the examples, we only extract some randomly chosen rows of the iris dataset and store them as a data.table object.

set.seed(642)
iris_dt <- as.data.table(iris[sample(1:nrow(iris), 20), c(2, 5)])                        # Generate new data.table object

Example: Creating Ranking Variables

In this first example, we create a ranking variable rank1 containing the ranks of the observations according to variable Sepal.Width.

iris_dt[, rank1 := frank(Sepal.Width, ties.method = "max")]                              # Creating a ranking variable for Sepal.Width
iris_dt[order(Sepal.Width), ]                                                            # Reordering the data
#     Sepal.Width    Species rank1
#  1:         2.0 versicolor     1
#  2:         2.3 versicolor     2
#  3:         2.5 versicolor     3
#  4:         2.6  virginica     4
#  5:         2.7 versicolor     6
#  6:         2.7  virginica     6
#  7:         2.8  virginica     7
#  8:         3.0 versicolor    13
#  9:         3.0     setosa    13
# 10:         3.0 versicolor    13
# 11:         3.0  virginica    13
# 12:         3.0  virginica    13
# 13:         3.0 versicolor    13
# 14:         3.1  virginica    15
# 15:         3.1 versicolor    15
# 16:         3.2 versicolor    16
# 17:         3.3  virginica    17
# 18:         3.8     setosa    20
# 19:         3.8  virginica    20
# 20:         3.8  virginica    20

In the previous code, we used option ties.method to determine what to do about values which have the same rank. In the example, we assign the highest of their rank vales to all single values. In the reordered data you can for example see that there is no rank 5 and instead two observations have rank 6.

Now, we generate two ranking variables containing grouping information.

iris_dt[, rank_by_Species := frank(iris_dt, Species, Sepal.Width, ties.method = "min")]  # Ranking considering Species
iris_dt[, rank_by_Species_2 := frank(Sepal.Width, ties.method = "min"), by = Species]    # Ranking per Species
iris_dt[order(Species, Sepal.Width), ] 
#     Sepal.Width    Species rank1 rank_by_Species rank_by_Species_2
#  1:         3.0     setosa    13               1                 1
#  2:         3.8     setosa    20               2                 2
#  3:         2.0 versicolor     1               3                 1
#  4:         2.3 versicolor     2               4                 2
#  5:         2.5 versicolor     3               5                 3
#  6:         2.7 versicolor     6               6                 4
#  7:         3.0 versicolor    13               7                 5
#  8:         3.0 versicolor    13               7                 5
#  9:         3.0 versicolor    13               7                 5
# 10:         3.1 versicolor    15              10                 8
# 11:         3.2 versicolor    16              11                 9
# 12:         2.6  virginica     4              12                 1
# 13:         2.7  virginica     6              13                 2
# 14:         2.8  virginica     7              14                 3
# 15:         3.0  virginica    13              15                 4
# 16:         3.0  virginica    13              15                 4
# 17:         3.1  virginica    15              17                 6
# 18:         3.3  virginica    17              18                 7
# 19:         3.8  virginica    20              19                 8
# 20:         3.8  virginica    20              19                 8

 

Variable rank_by_Species contains the ranks according to both Species and Sepal.Width. Variable rank_by_Species_2, on the other hand, contains the ranks according to Sepal.Width for each single class of Species.

Anna-Lena Wölwer R Programming & Survey Statistics

Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

Menu
Top