Reshaping a data.table in R (3 Examples)

In this article you’ll learn how to use the reshape function on a data.table in R.

Setting up the Examples

Install and load the data.table package.

install.packages("data.table")                    # Install data.table package
library("data.table")                             # Load data.table package

As an example dataset, we take the sunspots data, which is built into base R.

data(sunspots)                                    # Load sunspots data set
sunspots_DT <- data.table(year = 1749:1983,       # Convert data from time series into data.table
                          Y    = matrix(sunspots, ncol = 12, byrow = TRUE))
colnames(sunspots_DT)[2:ncol(sunspots_DT)] <- month.abb[1:12]
 
head(sunspots_DT)                                 # Print the head of the data
#    year  Jan  Feb  Mar  Apr  May   Jun  Jul   Aug  Sep  Oct   Nov  Dec
# 1: 1749 58.0 62.6 70.0 55.7 85.0  83.5 94.8  66.3 75.9 75.5 158.6 85.2
# 2: 1750 73.3 75.9 89.2 88.3 90.0 100.0 85.4 103.0 91.2 65.7  63.3 75.4
# 3: 1751 70.0 43.5 45.3 56.4 60.7  50.7 66.3  59.8 23.5 23.2  28.5 44.0
# 4: 1752 35.0 50.0 71.0 59.3 59.7  39.6 78.4  29.3 27.1 46.6  37.6 40.0
# 5: 1753 44.0 32.0 45.7 38.0 36.0  31.7 22.2  39.0 28.0 25.0  20.0  6.7
# 6: 1754  0.0  3.0  1.7 13.7 20.7  26.7 18.8  12.3  8.2 24.1  13.2  4.2

The data is a time series dataset consisting of the mean relative sunspot numbers per month and year. We transformed it into a data.table with the above lines of code. You can see that the data is in the wide format, for every year there is one data row containing the information of each month in a specific column.

Example 1: Reshaping the Data From Wide to Long Format

With the reshape function, we can manipulate our data from a wide format into a long format. For that, we have to tell the reshape function the direction of the data transformation (long), the ID variable (year), the name which we want to have for the time variable (month), the names of those columns which contain the varying information (Jan-Dec), and the name which we want to have for the numeric values (value).

sunspots_DT_2 <- reshape(sunspots_DT,             # Reshape data from wide to long format
                        direction = "long",
                        idvar     = "year",
                        timevar   = "month",
                        times     = colnames(sunspots_DT)[2:ncol(sunspots_DT)],
                        varying   = list(colnames(sunspots_DT)[2:ncol(sunspots_DT)]),
                        v.names   = "value")
 
head(sunspots_DT_2)                               # Print head of data
#    year month value
# 1: 1749   Jan  58.0
# 2: 1750   Jan  73.3
# 3: 1751   Jan  70.0
# 4: 1752   Jan  35.0
# 5: 1753   Jan  44.0
# 6: 1754   Jan   0.0

The data in the long format consists of one data row per combination of year and month.

Example 2: Reshaping the Data From Long to Wide Format

Now we reverse the action. We take the data in the long format (sunspots_DT_2) and transform it back into the wide format. We can simply do this by again applying the reshape function, only that now we state that the direction is wide.

sunspots_DT_3 <- reshape(sunspots_DT_2,           # Reshape data from long to wide format
                 direction = "wide")
sunspots_DT_3                                     # Print data
#    year   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
# 1: 1749  58.0  62.6  70.0  55.7  85.0  83.5  94.8  66.3  75.9  75.5 158.6  85.2
# 2: 1750  73.3  75.9  89.2  88.3  90.0 100.0  85.4 103.0  91.2  65.7  63.3  75.4
# 3: 1751  70.0  43.5  45.3  56.4  60.7  50.7  66.3  59.8  23.5  23.2  28.5  44.0
# 4: 1752  35.0  50.0  71.0  59.3  59.7  39.6  78.4  29.3  27.1  46.6  37.6  40.0
# 5: 1753  44.0  32.0  45.7  38.0  36.0  31.7  22.2  39.0  28.0  25.0  20.0   6.7
# ---                                                                             
# 231: 1979 166.6 137.5 138.0 101.5 134.4 149.5 159.4 142.2 188.4 186.2 183.3 176.3
# 232: 1980 159.6 155.0 126.2 164.1 179.9 157.3 136.3 135.4 155.0 164.7 147.9 174.4
# 233: 1981 114.0 141.3 135.5 156.4 127.5  90.0 143.8 158.7 167.3 162.4 137.5 150.1
# 234: 1982 111.2 163.6 153.8 122.0  82.2 110.4 106.1 107.6 118.8  94.7  98.1 127.0
# 235: 1983  84.3  51.0  66.5  80.7  99.2  91.1  82.2  71.8  50.3  55.8  33.3  33.4

You see that it is the same structure as we have in data.table sunspots_DT.

Example 3: Reshaping Unbalanced Data

Data in the long format is often unbalanced. For example, for some years the information for some month may be missing. Below, we demonstrate how the reshape function handles such data. For that, we first set some information to NA.

sunspots_DT_4 <- data.table::copy(sunspots_DT_2)  # Replicate the sunspots data set
sunspots_DT_4[ year == 1749 & month %in% month.abb[1:10], value := NA] # Set some information to NA
#       year month value
#    1: 1749   Jan    NA
#    2: 1750   Jan  73.3
#    3: 1751   Jan  70.0
#    4: 1752   Jan  35.0
#    5: 1753   Jan  44.0
#   ---                 
# 2816: 1979   Dec 176.3
# 2817: 1980   Dec 174.4
# 2818: 1981   Dec 150.1
# 2819: 1982   Dec 127.0
# 2820: 1983   Dec  33.4

sunspots_DT_5 <- reshape(sunspots_DT_4,           # Reshape the unbalanced data from long to wide format
                 direction = "wide")
sunspots_DT_5                                     # Print data
#    year   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
# 1: 1749    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA 158.6  85.2
# 2: 1750  73.3  75.9  89.2  88.3  90.0 100.0  85.4 103.0  91.2  65.7  63.3  75.4
# 3: 1751  70.0  43.5  45.3  56.4  60.7  50.7  66.3  59.8  23.5  23.2  28.5  44.0
# 4: 1752  35.0  50.0  71.0  59.3  59.7  39.6  78.4  29.3  27.1  46.6  37.6  40.0
# 5: 1753  44.0  32.0  45.7  38.0  36.0  31.7  22.2  39.0  28.0  25.0  20.0   6.7
# ---                                                                             
# 231: 1979 166.6 137.5 138.0 101.5 134.4 149.5 159.4 142.2 188.4 186.2 183.3 176.3
# 232: 1980 159.6 155.0 126.2 164.1 179.9 157.3 136.3 135.4 155.0 164.7 147.9 174.4
# 233: 1981 114.0 141.3 135.5 156.4 127.5  90.0 143.8 158.7 167.3 162.4 137.5 150.1
# 234: 1982 111.2 163.6 153.8 122.0  82.2 110.4 106.1 107.6 118.8  94.7  98.1 127.0
# 235: 1983  84.3  51.0  66.5  80.7  99.2  91.1  82.2  71.8  50.3  55.8  33.3  33.4

As you can see, the reshape function has a built-in way of handling missing information. The corresponding cells of the table are set to NA.

 

Anna-Lena Wölwer R Programming & Survey Statistics

Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.

Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

Menu
Top