Reshaping a data.table in R (3 Examples)
In this article you’ll learn how to use the reshape function on a data.table in R.
Setting up the Examples
Install and load the data.table package.
install.packages("data.table") # Install data.table package library("data.table") # Load data.table package |
install.packages("data.table") # Install data.table package library("data.table") # Load data.table package
As an example dataset, we take the sunspots data, which is built into base R.
data(sunspots) # Load sunspots data set sunspots_DT <- data.table(year = 1749:1983, # Convert data from time series into data.table Y = matrix(sunspots, ncol = 12, byrow = TRUE)) colnames(sunspots_DT)[2:ncol(sunspots_DT)] <- month.abb[1:12] head(sunspots_DT) # Print the head of the data # year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec # 1: 1749 58.0 62.6 70.0 55.7 85.0 83.5 94.8 66.3 75.9 75.5 158.6 85.2 # 2: 1750 73.3 75.9 89.2 88.3 90.0 100.0 85.4 103.0 91.2 65.7 63.3 75.4 # 3: 1751 70.0 43.5 45.3 56.4 60.7 50.7 66.3 59.8 23.5 23.2 28.5 44.0 # 4: 1752 35.0 50.0 71.0 59.3 59.7 39.6 78.4 29.3 27.1 46.6 37.6 40.0 # 5: 1753 44.0 32.0 45.7 38.0 36.0 31.7 22.2 39.0 28.0 25.0 20.0 6.7 # 6: 1754 0.0 3.0 1.7 13.7 20.7 26.7 18.8 12.3 8.2 24.1 13.2 4.2 |
data(sunspots) # Load sunspots data set sunspots_DT <- data.table(year = 1749:1983, # Convert data from time series into data.table Y = matrix(sunspots, ncol = 12, byrow = TRUE)) colnames(sunspots_DT)[2:ncol(sunspots_DT)] <- month.abb[1:12] head(sunspots_DT) # Print the head of the data # year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec # 1: 1749 58.0 62.6 70.0 55.7 85.0 83.5 94.8 66.3 75.9 75.5 158.6 85.2 # 2: 1750 73.3 75.9 89.2 88.3 90.0 100.0 85.4 103.0 91.2 65.7 63.3 75.4 # 3: 1751 70.0 43.5 45.3 56.4 60.7 50.7 66.3 59.8 23.5 23.2 28.5 44.0 # 4: 1752 35.0 50.0 71.0 59.3 59.7 39.6 78.4 29.3 27.1 46.6 37.6 40.0 # 5: 1753 44.0 32.0 45.7 38.0 36.0 31.7 22.2 39.0 28.0 25.0 20.0 6.7 # 6: 1754 0.0 3.0 1.7 13.7 20.7 26.7 18.8 12.3 8.2 24.1 13.2 4.2
The data is a time series dataset consisting of the mean relative sunspot numbers per month and year. We transformed it into a data.table with the above lines of code. You can see that the data is in the wide format, for every year there is one data row containing the information of each month in a specific column.
Example 1: Reshaping the Data From Wide to Long Format
With the reshape function, we can manipulate our data from a wide format into a long format. For that, we have to tell the reshape function the direction of the data transformation (long), the ID variable (year), the name which we want to have for the time variable (month), the names of those columns which contain the varying information (Jan-Dec), and the name which we want to have for the numeric values (value).
sunspots_DT_2 <- reshape(sunspots_DT, # Reshape data from wide to long format direction = "long", idvar = "year", timevar = "month", times = colnames(sunspots_DT)[2:ncol(sunspots_DT)], varying = list(colnames(sunspots_DT)[2:ncol(sunspots_DT)]), v.names = "value") head(sunspots_DT_2) # Print head of data # year month value # 1: 1749 Jan 58.0 # 2: 1750 Jan 73.3 # 3: 1751 Jan 70.0 # 4: 1752 Jan 35.0 # 5: 1753 Jan 44.0 # 6: 1754 Jan 0.0 |
sunspots_DT_2 <- reshape(sunspots_DT, # Reshape data from wide to long format direction = "long", idvar = "year", timevar = "month", times = colnames(sunspots_DT)[2:ncol(sunspots_DT)], varying = list(colnames(sunspots_DT)[2:ncol(sunspots_DT)]), v.names = "value") head(sunspots_DT_2) # Print head of data # year month value # 1: 1749 Jan 58.0 # 2: 1750 Jan 73.3 # 3: 1751 Jan 70.0 # 4: 1752 Jan 35.0 # 5: 1753 Jan 44.0 # 6: 1754 Jan 0.0
The data in the long format consists of one data row per combination of year and month.
Example 2: Reshaping the Data From Long to Wide Format
Now we reverse the action. We take the data in the long format (sunspots_DT_2) and transform it back into the wide format. We can simply do this by again applying the reshape function, only that now we state that the direction is wide.
sunspots_DT_3 <- reshape(sunspots_DT_2, # Reshape data from long to wide format direction = "wide") sunspots_DT_3 # Print data # year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec # 1: 1749 58.0 62.6 70.0 55.7 85.0 83.5 94.8 66.3 75.9 75.5 158.6 85.2 # 2: 1750 73.3 75.9 89.2 88.3 90.0 100.0 85.4 103.0 91.2 65.7 63.3 75.4 # 3: 1751 70.0 43.5 45.3 56.4 60.7 50.7 66.3 59.8 23.5 23.2 28.5 44.0 # 4: 1752 35.0 50.0 71.0 59.3 59.7 39.6 78.4 29.3 27.1 46.6 37.6 40.0 # 5: 1753 44.0 32.0 45.7 38.0 36.0 31.7 22.2 39.0 28.0 25.0 20.0 6.7 # --- # 231: 1979 166.6 137.5 138.0 101.5 134.4 149.5 159.4 142.2 188.4 186.2 183.3 176.3 # 232: 1980 159.6 155.0 126.2 164.1 179.9 157.3 136.3 135.4 155.0 164.7 147.9 174.4 # 233: 1981 114.0 141.3 135.5 156.4 127.5 90.0 143.8 158.7 167.3 162.4 137.5 150.1 # 234: 1982 111.2 163.6 153.8 122.0 82.2 110.4 106.1 107.6 118.8 94.7 98.1 127.0 # 235: 1983 84.3 51.0 66.5 80.7 99.2 91.1 82.2 71.8 50.3 55.8 33.3 33.4 |
sunspots_DT_3 <- reshape(sunspots_DT_2, # Reshape data from long to wide format direction = "wide") sunspots_DT_3 # Print data # year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec # 1: 1749 58.0 62.6 70.0 55.7 85.0 83.5 94.8 66.3 75.9 75.5 158.6 85.2 # 2: 1750 73.3 75.9 89.2 88.3 90.0 100.0 85.4 103.0 91.2 65.7 63.3 75.4 # 3: 1751 70.0 43.5 45.3 56.4 60.7 50.7 66.3 59.8 23.5 23.2 28.5 44.0 # 4: 1752 35.0 50.0 71.0 59.3 59.7 39.6 78.4 29.3 27.1 46.6 37.6 40.0 # 5: 1753 44.0 32.0 45.7 38.0 36.0 31.7 22.2 39.0 28.0 25.0 20.0 6.7 # --- # 231: 1979 166.6 137.5 138.0 101.5 134.4 149.5 159.4 142.2 188.4 186.2 183.3 176.3 # 232: 1980 159.6 155.0 126.2 164.1 179.9 157.3 136.3 135.4 155.0 164.7 147.9 174.4 # 233: 1981 114.0 141.3 135.5 156.4 127.5 90.0 143.8 158.7 167.3 162.4 137.5 150.1 # 234: 1982 111.2 163.6 153.8 122.0 82.2 110.4 106.1 107.6 118.8 94.7 98.1 127.0 # 235: 1983 84.3 51.0 66.5 80.7 99.2 91.1 82.2 71.8 50.3 55.8 33.3 33.4
You see that it is the same structure as we have in data.table sunspots_DT.
Example 3: Reshaping Unbalanced Data
Data in the long format is often unbalanced. For example, for some years the information for some month may be missing. Below, we demonstrate how the reshape function handles such data. For that, we first set some information to NA.
sunspots_DT_4 <- data.table::copy(sunspots_DT_2) # Replicate the sunspots data set sunspots_DT_4[ year == 1749 & month %in% month.abb[1:10], value := NA] # Set some information to NA # year month value # 1: 1749 Jan NA # 2: 1750 Jan 73.3 # 3: 1751 Jan 70.0 # 4: 1752 Jan 35.0 # 5: 1753 Jan 44.0 # --- # 2816: 1979 Dec 176.3 # 2817: 1980 Dec 174.4 # 2818: 1981 Dec 150.1 # 2819: 1982 Dec 127.0 # 2820: 1983 Dec 33.4 sunspots_DT_5 <- reshape(sunspots_DT_4, # Reshape the unbalanced data from long to wide format direction = "wide") sunspots_DT_5 # Print data # year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec # 1: 1749 NA NA NA NA NA NA NA NA NA NA 158.6 85.2 # 2: 1750 73.3 75.9 89.2 88.3 90.0 100.0 85.4 103.0 91.2 65.7 63.3 75.4 # 3: 1751 70.0 43.5 45.3 56.4 60.7 50.7 66.3 59.8 23.5 23.2 28.5 44.0 # 4: 1752 35.0 50.0 71.0 59.3 59.7 39.6 78.4 29.3 27.1 46.6 37.6 40.0 # 5: 1753 44.0 32.0 45.7 38.0 36.0 31.7 22.2 39.0 28.0 25.0 20.0 6.7 # --- # 231: 1979 166.6 137.5 138.0 101.5 134.4 149.5 159.4 142.2 188.4 186.2 183.3 176.3 # 232: 1980 159.6 155.0 126.2 164.1 179.9 157.3 136.3 135.4 155.0 164.7 147.9 174.4 # 233: 1981 114.0 141.3 135.5 156.4 127.5 90.0 143.8 158.7 167.3 162.4 137.5 150.1 # 234: 1982 111.2 163.6 153.8 122.0 82.2 110.4 106.1 107.6 118.8 94.7 98.1 127.0 # 235: 1983 84.3 51.0 66.5 80.7 99.2 91.1 82.2 71.8 50.3 55.8 33.3 33.4 |
sunspots_DT_4 <- data.table::copy(sunspots_DT_2) # Replicate the sunspots data set sunspots_DT_4[ year == 1749 & month %in% month.abb[1:10], value := NA] # Set some information to NA # year month value # 1: 1749 Jan NA # 2: 1750 Jan 73.3 # 3: 1751 Jan 70.0 # 4: 1752 Jan 35.0 # 5: 1753 Jan 44.0 # --- # 2816: 1979 Dec 176.3 # 2817: 1980 Dec 174.4 # 2818: 1981 Dec 150.1 # 2819: 1982 Dec 127.0 # 2820: 1983 Dec 33.4 sunspots_DT_5 <- reshape(sunspots_DT_4, # Reshape the unbalanced data from long to wide format direction = "wide") sunspots_DT_5 # Print data # year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec # 1: 1749 NA NA NA NA NA NA NA NA NA NA 158.6 85.2 # 2: 1750 73.3 75.9 89.2 88.3 90.0 100.0 85.4 103.0 91.2 65.7 63.3 75.4 # 3: 1751 70.0 43.5 45.3 56.4 60.7 50.7 66.3 59.8 23.5 23.2 28.5 44.0 # 4: 1752 35.0 50.0 71.0 59.3 59.7 39.6 78.4 29.3 27.1 46.6 37.6 40.0 # 5: 1753 44.0 32.0 45.7 38.0 36.0 31.7 22.2 39.0 28.0 25.0 20.0 6.7 # --- # 231: 1979 166.6 137.5 138.0 101.5 134.4 149.5 159.4 142.2 188.4 186.2 183.3 176.3 # 232: 1980 159.6 155.0 126.2 164.1 179.9 157.3 136.3 135.4 155.0 164.7 147.9 174.4 # 233: 1981 114.0 141.3 135.5 156.4 127.5 90.0 143.8 158.7 167.3 162.4 137.5 150.1 # 234: 1982 111.2 163.6 153.8 122.0 82.2 110.4 106.1 107.6 118.8 94.7 98.1 127.0 # 235: 1983 84.3 51.0 66.5 80.7 99.2 91.1 82.2 71.8 50.3 55.8 33.3 33.4
As you can see, the reshape function has a built-in way of handling missing information. The corresponding cells of the table are set to NA.
Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.