Getting the Ranks per Group in R (Example Code)
In this tutorial, you’ll learn how to generate a variable containing the rank per group in the R programming language.
Preparing the Example
We use the data.table package.
install.packages(data.table) # Install & load csharp library(data.table) |
install.packages(data.table) # Install & load csharp library(data.table)
For the examples, we use the iris dataset.
data(iris) # Load iris data set head(iris) # Print head of data # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa |
data(iris) # Load iris data set head(iris) # Print head of data # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa
For the examples, we only extract some randomly chosen rows of the iris dataset and store them as a data.table object.
set.seed(642) iris_dt <- as.data.table(iris[sample(1:nrow(iris), 20), c(2, 5)]) # Generate new data.table object |
set.seed(642) iris_dt <- as.data.table(iris[sample(1:nrow(iris), 20), c(2, 5)]) # Generate new data.table object
Example: Creating Ranking Variables
In this first example, we create a ranking variable rank1 containing the ranks of the observations according to variable Sepal.Width.
iris_dt[, rank1 := frank(Sepal.Width, ties.method = "max")] # Creating a ranking variable for Sepal.Width iris_dt[order(Sepal.Width), ] # Reordering the data # Sepal.Width Species rank1 # 1: 2.0 versicolor 1 # 2: 2.3 versicolor 2 # 3: 2.5 versicolor 3 # 4: 2.6 virginica 4 # 5: 2.7 versicolor 6 # 6: 2.7 virginica 6 # 7: 2.8 virginica 7 # 8: 3.0 versicolor 13 # 9: 3.0 setosa 13 # 10: 3.0 versicolor 13 # 11: 3.0 virginica 13 # 12: 3.0 virginica 13 # 13: 3.0 versicolor 13 # 14: 3.1 virginica 15 # 15: 3.1 versicolor 15 # 16: 3.2 versicolor 16 # 17: 3.3 virginica 17 # 18: 3.8 setosa 20 # 19: 3.8 virginica 20 # 20: 3.8 virginica 20 |
iris_dt[, rank1 := frank(Sepal.Width, ties.method = "max")] # Creating a ranking variable for Sepal.Width iris_dt[order(Sepal.Width), ] # Reordering the data # Sepal.Width Species rank1 # 1: 2.0 versicolor 1 # 2: 2.3 versicolor 2 # 3: 2.5 versicolor 3 # 4: 2.6 virginica 4 # 5: 2.7 versicolor 6 # 6: 2.7 virginica 6 # 7: 2.8 virginica 7 # 8: 3.0 versicolor 13 # 9: 3.0 setosa 13 # 10: 3.0 versicolor 13 # 11: 3.0 virginica 13 # 12: 3.0 virginica 13 # 13: 3.0 versicolor 13 # 14: 3.1 virginica 15 # 15: 3.1 versicolor 15 # 16: 3.2 versicolor 16 # 17: 3.3 virginica 17 # 18: 3.8 setosa 20 # 19: 3.8 virginica 20 # 20: 3.8 virginica 20
In the previous code, we used option ties.method to determine what to do about values which have the same rank. In the example, we assign the highest of their rank vales to all single values. In the reordered data you can for example see that there is no rank 5 and instead two observations have rank 6.
Now, we generate two ranking variables containing grouping information.
iris_dt[, rank_by_Species := frank(iris_dt, Species, Sepal.Width, ties.method = "min")] # Ranking considering Species iris_dt[, rank_by_Species_2 := frank(Sepal.Width, ties.method = "min"), by = Species] # Ranking per Species iris_dt[order(Species, Sepal.Width), ] # Sepal.Width Species rank1 rank_by_Species rank_by_Species_2 # 1: 3.0 setosa 13 1 1 # 2: 3.8 setosa 20 2 2 # 3: 2.0 versicolor 1 3 1 # 4: 2.3 versicolor 2 4 2 # 5: 2.5 versicolor 3 5 3 # 6: 2.7 versicolor 6 6 4 # 7: 3.0 versicolor 13 7 5 # 8: 3.0 versicolor 13 7 5 # 9: 3.0 versicolor 13 7 5 # 10: 3.1 versicolor 15 10 8 # 11: 3.2 versicolor 16 11 9 # 12: 2.6 virginica 4 12 1 # 13: 2.7 virginica 6 13 2 # 14: 2.8 virginica 7 14 3 # 15: 3.0 virginica 13 15 4 # 16: 3.0 virginica 13 15 4 # 17: 3.1 virginica 15 17 6 # 18: 3.3 virginica 17 18 7 # 19: 3.8 virginica 20 19 8 # 20: 3.8 virginica 20 19 8 |
iris_dt[, rank_by_Species := frank(iris_dt, Species, Sepal.Width, ties.method = "min")] # Ranking considering Species iris_dt[, rank_by_Species_2 := frank(Sepal.Width, ties.method = "min"), by = Species] # Ranking per Species iris_dt[order(Species, Sepal.Width), ] # Sepal.Width Species rank1 rank_by_Species rank_by_Species_2 # 1: 3.0 setosa 13 1 1 # 2: 3.8 setosa 20 2 2 # 3: 2.0 versicolor 1 3 1 # 4: 2.3 versicolor 2 4 2 # 5: 2.5 versicolor 3 5 3 # 6: 2.7 versicolor 6 6 4 # 7: 3.0 versicolor 13 7 5 # 8: 3.0 versicolor 13 7 5 # 9: 3.0 versicolor 13 7 5 # 10: 3.1 versicolor 15 10 8 # 11: 3.2 versicolor 16 11 9 # 12: 2.6 virginica 4 12 1 # 13: 2.7 virginica 6 13 2 # 14: 2.8 virginica 7 14 3 # 15: 3.0 virginica 13 15 4 # 16: 3.0 virginica 13 15 4 # 17: 3.1 virginica 15 17 6 # 18: 3.3 virginica 17 18 7 # 19: 3.8 virginica 20 19 8 # 20: 3.8 virginica 20 19 8
Variable rank_by_Species contains the ranks according to both Species and Sepal.Width. Variable rank_by_Species_2, on the other hand, contains the ranks according to Sepal.Width for each single class of Species.
Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.