In R: Package microbenchmark (Example Code)

In this article, I’ll demonstrate how to use the microbenchmark package in R programming.

Setting up the Example

We load the packages: microbenchmark, ggplot2, data.table, and dplyr.

install.packages("microbenchmark")                # Install & load microbenchmark package
library("microbenchmark")
 
install.packages("ggplot2")                       # Install & load ggplot2 package
library("ggplot2")                              
 
install.packages("data.table")                    # Install & load data.table
library("data.table")
 
install.packages("dplyr")                         # Install & load dplyr package
library("dplyr")

For illustration, we use the iris dataset.

data(iris)                                        # Load iris data set
head(iris)                                        # Print head of data
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

We transform the dataset into a data.table object.

iris_dt <- as.data.table(iris)                    # Generate new data.table object

Example: Calculating Group Medians

Now, we try different ways of calculating group medians. With the microbenchmark package, we can compare the performance of the different approaches. The microbenchmark package executes the different command for a predefined number of times. In the example below, we have 200 repetitions of each approach to have a more stable picture than just using few iterations.

bench_res <- microbenchmark("E1" = iris_dt %>%    # Benchmarking the performance of different ways of calculating group medians
                              group_by(Species) %>%
                              filter(Sepal.Length <= 5) %>%
                              summarize(median(Petal.Width)),
                            "E2" = sapply(unique(iris_dt$Species),
                                          function (x){
                                            median(iris_dt[iris_dt$Sepal.Length <= 5 & iris_dt$Species == x, Petal.Width])
                                          }),
                            "E3" = iris_dt[Sepal.Length <= 5, median(Petal.Width), Species],
                            times = 200)
bench_res                                         # See results
# Unit: milliseconds
#  expr       min       lq      mean    median        uq       max neval
#    E1 12.120701 14.02365 17.164829 15.797850 18.789552 36.391402   200
#    E2  1.600201  1.86085  2.733440  2.183451  3.181651 12.679801   200
#    E3  1.023701  1.16070  1.685625  1.268501  1.792151  7.435401   200

You can see that the third approach, that is using data.table commands, is the most efficient way of calculating group means. It is much faster than the piping approach (E1).

We can visualize the performance using ggplot2.

ggplot(bench_res,                                 # Plot performance comparison
       aes(x = time/1000, y = expr, fill = expr)) + 
  geom_boxplot() +
  scale_x_continuous(trans = 'log2')
# ggplot2::autoplot(bench_res)

r graph figure 1 r package microbenchmark programming language

The visualization makes the distribution of the computation time of the different approaches over the 200 evaluations even more clear and can also reveal heavy outliers.

Anna-Lena Wölwer R Programming & Survey Statistics

Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.

In R: Package microbenchmark (Example Code)

Setting up the Example

Example: Calculating Group Medians

Leave a Reply Cancel reply

How to Apply crossprod & tcrossprod Functions in R (2 Examples)

Get Index Locations of Non-Zero Values in Matrix in R (Example Code)

Make Plot Axes Square Shaped in Base R & ggplot2 (2 Examples)