Filtering data.table Rows in R (3 Examples)
This tutorial illustrates how to select certain data.table rows in the R programming language. The data rows can be addressed via certain column values or the index of the rows.
Preparing the Examples
Install the data.table package and load it.
install.packages("data.table") # Install & load data.table library("data.table") |
install.packages("data.table") # Install & load data.table library("data.table")
The example data is the iris dataset. It is a built-in dataset in R.
data(iris) # Load iris data set head(iris) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa |
data(iris) # Load iris data set head(iris) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa
The iris data is a data.frame. We convert it into a data.table object called iris_dt.
iris_dt <- data.table::copy(iris) # Replicate iris data set setDT(iris_dt) # Convert iris to a data.table |
iris_dt <- data.table::copy(iris) # Replicate iris data set setDT(iris_dt) # Convert iris to a data.table
Example 1: Remove Certain Rows by Index
In this example, we want to remove certain rows from the iris data by indexing them.
iris_dt_rem <- iris_dt[ -c(1,3,4,5), ] head(iris_dt_rem) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1: 4.9 3.0 1.4 0.2 setosa # 2: 5.4 3.9 1.7 0.4 setosa # 3: 4.6 3.4 1.4 0.3 setosa # 4: 5.0 3.4 1.5 0.2 setosa # 5: 4.4 2.9 1.4 0.2 setosa # 6: 4.9 3.1 1.5 0.1 setosa |
iris_dt_rem <- iris_dt[ -c(1,3,4,5), ] head(iris_dt_rem) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1: 4.9 3.0 1.4 0.2 setosa # 2: 5.4 3.9 1.7 0.4 setosa # 3: 4.6 3.4 1.4 0.3 setosa # 4: 5.0 3.4 1.5 0.2 setosa # 5: 4.4 2.9 1.4 0.2 setosa # 6: 4.9 3.1 1.5 0.1 setosa
We deleted the rows indexed by number 1, 3, and 5 from the data.
Example 2: Filter by Values of Species
Certain rows can be addressed by the values of a column. Below, we filter all those rows in the data for which variable Species is equal to setosa.
iris_dt2 <- iris_dt[ Species == "setosa", ] head(iris_dt2) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1: 5.1 3.5 1.4 0.2 setosa # 2: 4.9 3.0 1.4 0.2 setosa # 3: 4.7 3.2 1.3 0.2 setosa # 4: 4.6 3.1 1.5 0.2 setosa # 5: 5.0 3.6 1.4 0.2 setosa # 6: 5.4 3.9 1.7 0.4 setosa |
iris_dt2 <- iris_dt[ Species == "setosa", ] head(iris_dt2) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1: 5.1 3.5 1.4 0.2 setosa # 2: 4.9 3.0 1.4 0.2 setosa # 3: 4.7 3.2 1.3 0.2 setosa # 4: 4.6 3.1 1.5 0.2 setosa # 5: 5.0 3.6 1.4 0.2 setosa # 6: 5.4 3.9 1.7 0.4 setosa
Example 3: Filter by Values of Species and Petal.Length
In this example, we use multiple rows for selecting the columns of the dataset. We display all those rows for which variable Species is equal to setosa and variable Petal.Length exceeds 1.5.
iris_dt3 <- iris_dt[ Species == "setosa" & Petal.Length >= 1.5, ] head(iris_dt3) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1: 4.6 3.1 1.5 0.2 setosa # 2: 5.4 3.9 1.7 0.4 setosa # 3: 5.0 3.4 1.5 0.2 setosa # 4: 4.9 3.1 1.5 0.1 setosa # 5: 5.4 3.7 1.5 0.2 setosa # 6: 4.8 3.4 1.6 0.2 setosa |
iris_dt3 <- iris_dt[ Species == "setosa" & Petal.Length >= 1.5, ] head(iris_dt3) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1: 4.6 3.1 1.5 0.2 setosa # 2: 5.4 3.9 1.7 0.4 setosa # 3: 5.0 3.4 1.5 0.2 setosa # 4: 4.9 3.1 1.5 0.1 setosa # 5: 5.4 3.7 1.5 0.2 setosa # 6: 4.8 3.4 1.6 0.2 setosa
Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.