Mode Imputation in R (Example)
This tutorial explains how to impute missing values by the mode in the R programming language.
Create Function for Computation of Mode in R
R does not provide a built-in function for the calculation of the mode. For that reason we need to create our own function:
my_mode <- function(x) { # Create mode function unique_x <- unique(x) mode <- unique_x[which.max(tabulate(match(x, unique_x)))] mode } |
my_mode <- function(x) { # Create mode function unique_x <- unique(x) mode <- unique_x[which.max(tabulate(match(x, unique_x)))] mode }
Example Data
Our data with missing values looks as follows:
vec <- factor(c(4, NA, 7, 5, 7, 1, 6, 3, NA, 5, 5)) # Create example vector |
vec <- factor(c(4, NA, 7, 5, 7, 1, 6, 3, NA, 5, 5)) # Create example vector
Mode Imputation in R
Now we can apply mode substitution as follows:
vec[is.na(vec)] <- my_mode(vec[!is.na(vec)]) # Mode imputation vec # Print imputed vector # [1] 4 5 7 5 7 1 6 3 5 5 5 # Levels: 1 3 4 5 6 7 |
vec[is.na(vec)] <- my_mode(vec[!is.na(vec)]) # Mode imputation vec # Print imputed vector # [1] 4 5 7 5 7 1 6 3 5 5 5 # Levels: 1 3 4 5 6 7
Note that we imputed a simple categorical vector in this example. However, we could apply the same R code to the column of a more complex data frame as well.