Display Unique Rows & Values in a data.table in R (2 Examples)

This tutorial illustrates how to get the unique values of certain column combinations and how to remove duplicate rows from a data.table object in R.

Setting up the Examples

Install and load data.table.

install.packages("data.table")                                                    # Install data.table package
library("data.table")                                                             # Load data.table

Load the iris dataset for the examples.

data(iris)                                                                        # Load iris data set
head(iris)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

iris_dt <- data.table::copy(iris)                                                 # Replicate iris data set
setDT(iris_dt)                                                                    # Convert iris to a data.table

Example 1: Unique Values of a Column

For the example, we create an additional column in the iris data.table called Sepal.Length.class. Sepal.Length.class is a factor variable that divides Sepal.Length into different classes.

iris_dt_2 <- iris_dt[, Sepal.Length.class := cut(Sepal.Length,
                                                 breaks = c(4, 4.5, 5, 5.5, 8))]  # Create new column Sepal.Length.class

table(iris_dt_2$Sepal.Length.class)                                               # Table new column Sepal.Length.class
# (4,4.5] (4.5,5] (5,5.5] (5.5,8] 
#       5      27      27      91

The following code line displays the unique values of variable Sepal.Length.class for each value of variable Species. For that, we use the by-argument as shown below.

iris_dt_2[, unique(Sepal.Length.class), by = Species]                             # Show unique values of Sepal.Length.class by Species
#       Species      V1
# 1:     setosa (5,5.5]
# 2:     setosa (4.5,5]
# 3:     setosa (4,4.5]
# 4:     setosa (5.5,8]
# 5: versicolor (5.5,8]
# 6: versicolor (5,5.5]
# 7: versicolor (4.5,5]
# 8:  virginica (5.5,8]
# 9:  virginica (4.5,5]

Example 2: Unique Rows

In this example, we remove duplicate rows from the iris data.table. As shown below, we take the columns of variables Sepal.Length.class and Species and reduce the data to the unique rows of these two variables.

iris_dt_3 <- unique(iris_dt_2[, list(Sepal.Length.class, Species)])               # Unique rows for columns Sepal.Length.class and Species
iris_dt_3
#    Sepal.Length.class    Species
# 1:            (5,5.5]     setosa
# 2:            (4.5,5]     setosa
# 3:            (4,4.5]     setosa
# 4:            (5.5,8]     setosa
# 5:            (5.5,8] versicolor
# 6:            (5,5.5] versicolor
# 7:            (4.5,5] versicolor
# 8:            (5.5,8]  virginica
# 9:            (4.5,5]  virginica

When we take the complete dataset iris_dt_2, we can also take a look at the dimensions of the complete data and the data reduced to those rows which are unique.

dim(iris_dt_2)                                                                    # Dimension of original data
# [1] 150   6
dim(unique(iris_dt_2))                                                            # Dimension of data with unique rows
# [1] 149   6

In this example, there is only one duplicate row.

Display Unique Rows & Values in a data.table in R (2 Examples)

Setting up the Examples

Example 1: Unique Values of a Column

Example 2: Unique Rows

Related Tutorials

Leave a Reply Cancel reply

Attach & Detach Data Frame in R (Example)

How to Apply the as.POSIXlt Function in R (Example Code)

How to Round Up to Closest 10 (and 100) in R Programming Language (2 Examples)