Fill NA Values by 0 in a data.table in R (Example Code)

This tutorial demonstrates how to replace the NA values in a data.table by 0 in the R programming language. The replacement works for all kinds of columns like numeric, stringr, and logical.

Setting up the Example

Install and load the data.table package.

install.packages("data.table")                          # Install & load data.table package
library("data.table")

As an example, we take the iris dataset.

data(iris)                                              # Load iris data set
head(iris)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

With class(), we see that iris data is a data.frame.

class(iris)                                             # Object class of the data
# [1] "data.frame"

Copy the data to a new object called iris_dt and transform iris_dt into a data.table.

iris_dt <- data.table::copy(iris)                       # Replicate iris data set
setDT(iris_dt)                                          # Convert iris to a data.table
class(iris_dt)                                          # Object class of the data
# [1] "data.table" "data.frame"

Example: Replace NA values in all columns of the iris dataset

In data.table iris_dt, there are no NAs as we see from the following command.

iris_dt_no_na <- iris_dt                                # Replicate iris data.table
anyNA(iris_dt_no_na)                                    # Test if data.table contains NA
# [1] FALSE

We therefore create a new data.table object iris_dt_with_na which contains the information of iris_dt, but were we set the first 10 values of column Sepal.Length as NA.

iris_dt_with_na <- iris_dt                              # Replicate iris data.table
iris_dt_with_na$Sepal.Length[1:10] <- NA                # Set first 10 values of Sepal.Length as NA
anyNA(iris_dt_with_na)                                  # Test if data.table contains NA
# [1] TRUE

We can substitute the NA values in iris_dt_with_na by 0 with the following lines of code.

iris_dt_with_na_new <- iris_dt_with_na                  # Replicate data.table with NA
iris_dt_with_na_new[is.na(iris_dt_with_na_new), ] <- 0  # Fill all NA entries by 0
head(iris_dt_with_na_new)
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1:            0         3.5          1.4         0.2  setosa
# 2:            0         3.0          1.4         0.2  setosa
# 3:            0         3.2          1.3         0.2  setosa
# 4:            0         3.1          1.5         0.2  setosa
# 5:            0         3.6          1.4         0.2  setosa
# 6:            0         3.9          1.7         0.4  setosa
anyNA(iris_dt_with_na_new)                              # Test if data.table contains NA
# [1] FALSE

Applying function anyNA() assures us that there are no NAs left in the data.

Related Articles

In addition, you could have a look at the related tutorials on this website.

 

Anna-Lena Wölwer R Programming & Survey Statistics

Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

Menu
Top