Display Unique Rows & Values in a data.table in R (2 Examples)
This tutorial illustrates how to get the unique values of certain column combinations and how to remove duplicate rows from a data.table object in R.
Setting up the Examples
Install and load data.table.
install.packages("data.table") # Install data.table package library("data.table") # Load data.table |
install.packages("data.table") # Install data.table package library("data.table") # Load data.table
Load the iris dataset for the examples.
data(iris) # Load iris data set head(iris) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa |
data(iris) # Load iris data set head(iris) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa
iris_dt <- data.table::copy(iris) # Replicate iris data set setDT(iris_dt) # Convert iris to a data.table |
iris_dt <- data.table::copy(iris) # Replicate iris data set setDT(iris_dt) # Convert iris to a data.table
Example 1: Unique Values of a Column
For the example, we create an additional column in the iris data.table called Sepal.Length.class. Sepal.Length.class is a factor variable that divides Sepal.Length into different classes.
iris_dt_2 <- iris_dt[, Sepal.Length.class := cut(Sepal.Length, breaks = c(4, 4.5, 5, 5.5, 8))] # Create new column Sepal.Length.class table(iris_dt_2$Sepal.Length.class) # Table new column Sepal.Length.class # (4,4.5] (4.5,5] (5,5.5] (5.5,8] # 5 27 27 91 |
iris_dt_2 <- iris_dt[, Sepal.Length.class := cut(Sepal.Length, breaks = c(4, 4.5, 5, 5.5, 8))] # Create new column Sepal.Length.class table(iris_dt_2$Sepal.Length.class) # Table new column Sepal.Length.class # (4,4.5] (4.5,5] (5,5.5] (5.5,8] # 5 27 27 91
The following code line displays the unique values of variable Sepal.Length.class for each value of variable Species. For that, we use the by-argument as shown below.
iris_dt_2[, unique(Sepal.Length.class), by = Species] # Show unique values of Sepal.Length.class by Species # Species V1 # 1: setosa (5,5.5] # 2: setosa (4.5,5] # 3: setosa (4,4.5] # 4: setosa (5.5,8] # 5: versicolor (5.5,8] # 6: versicolor (5,5.5] # 7: versicolor (4.5,5] # 8: virginica (5.5,8] # 9: virginica (4.5,5] |
iris_dt_2[, unique(Sepal.Length.class), by = Species] # Show unique values of Sepal.Length.class by Species # Species V1 # 1: setosa (5,5.5] # 2: setosa (4.5,5] # 3: setosa (4,4.5] # 4: setosa (5.5,8] # 5: versicolor (5.5,8] # 6: versicolor (5,5.5] # 7: versicolor (4.5,5] # 8: virginica (5.5,8] # 9: virginica (4.5,5]
Example 2: Unique Rows
In this example, we remove duplicate rows from the iris data.table. As shown below, we take the columns of variables Sepal.Length.class and Species and reduce the data to the unique rows of these two variables.
iris_dt_3 <- unique(iris_dt_2[, list(Sepal.Length.class, Species)]) # Unique rows for columns Sepal.Length.class and Species iris_dt_3 # Sepal.Length.class Species # 1: (5,5.5] setosa # 2: (4.5,5] setosa # 3: (4,4.5] setosa # 4: (5.5,8] setosa # 5: (5.5,8] versicolor # 6: (5,5.5] versicolor # 7: (4.5,5] versicolor # 8: (5.5,8] virginica # 9: (4.5,5] virginica |
iris_dt_3 <- unique(iris_dt_2[, list(Sepal.Length.class, Species)]) # Unique rows for columns Sepal.Length.class and Species iris_dt_3 # Sepal.Length.class Species # 1: (5,5.5] setosa # 2: (4.5,5] setosa # 3: (4,4.5] setosa # 4: (5.5,8] setosa # 5: (5.5,8] versicolor # 6: (5,5.5] versicolor # 7: (4.5,5] versicolor # 8: (5.5,8] virginica # 9: (4.5,5] virginica
When we take the complete dataset iris_dt_2, we can also take a look at the dimensions of the complete data and the data reduced to those rows which are unique.
dim(iris_dt_2) # Dimension of original data # [1] 150 6 dim(unique(iris_dt_2)) # Dimension of data with unique rows # [1] 149 6 |
dim(iris_dt_2) # Dimension of original data # [1] 150 6 dim(unique(iris_dt_2)) # Dimension of data with unique rows # [1] 149 6
In this example, there is only one duplicate row.
Related Tutorials
Have a look at the following R programming tutorials. They focus on topics such as variables, extracting data, and missing data.
- Select Unique Rows of Data Frame Based On Certain Variables
- Extracting Rows with Missing Values in R
- Warning Message – Removed rows containing non-finite values (stat_bin)
- How to Select Unique Values of Data Frame Column
Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.