Filtering data.table Rows in R (3 Examples)

This tutorial illustrates how to select certain data.table rows in the R programming language. The data rows can be addressed via certain column values or the index of the rows.

Preparing the Examples

Install the data.table package and load it.

install.packages("data.table")       # Install & load data.table
library("data.table")

The example data is the iris dataset. It is a built-in dataset in R.

data(iris)                           # Load iris data set
head(iris)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

The iris data is a data.frame. We convert it into a data.table object called iris_dt.

iris_dt <- data.table::copy(iris)    # Replicate iris data set
setDT(iris_dt)                       # Convert iris to a data.table

Example 1: Remove Certain Rows by Index

In this example, we want to remove certain rows from the iris data by indexing them.

iris_dt_rem <- iris_dt[ -c(1,3,4,5), ]
head(iris_dt_rem)
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1:          4.9         3.0          1.4         0.2  setosa
# 2:          5.4         3.9          1.7         0.4  setosa
# 3:          4.6         3.4          1.4         0.3  setosa
# 4:          5.0         3.4          1.5         0.2  setosa
# 5:          4.4         2.9          1.4         0.2  setosa
# 6:          4.9         3.1          1.5         0.1  setosa

We deleted the rows indexed by number 1, 3, and 5 from the data.

Example 2: Filter by Values of Species

Certain rows can be addressed by the values of a column. Below, we filter all those rows in the data for which variable Species is equal to setosa.

iris_dt2 <- iris_dt[ Species == "setosa", ]
head(iris_dt2)
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1:          5.1         3.5          1.4         0.2  setosa
# 2:          4.9         3.0          1.4         0.2  setosa
# 3:          4.7         3.2          1.3         0.2  setosa
# 4:          4.6         3.1          1.5         0.2  setosa
# 5:          5.0         3.6          1.4         0.2  setosa
# 6:          5.4         3.9          1.7         0.4  setosa

Example 3: Filter by Values of Species and Petal.Length

In this example, we use multiple rows for selecting the columns of the dataset. We display all those rows for which variable Species is equal to setosa and variable Petal.Length exceeds 1.5.

iris_dt3 <- iris_dt[ Species == "setosa" & Petal.Length >= 1.5, ]
head(iris_dt3)
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1:          4.6         3.1          1.5         0.2  setosa
# 2:          5.4         3.9          1.7         0.4  setosa
# 3:          5.0         3.4          1.5         0.2  setosa
# 4:          4.9         3.1          1.5         0.1  setosa
# 5:          5.4         3.7          1.5         0.2  setosa
# 6:          4.8         3.4          1.6         0.2  setosa

 

Anna-Lena Wölwer R Programming & Survey Statistics

Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

Menu
Top