Building a data.table in R (2 Examples)
This page demonstrates how to you can create a data.table in R.
Setting up the Examples
Load the package data.table.
install.packages("data.table") # Install & load data.table package library("data.table") |
install.packages("data.table") # Install & load data.table package library("data.table")
We use the iris dataset as an example.
data(iris) # Load iris data set head(iris) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa |
data(iris) # Load iris data set head(iris) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa
Example 1: Create a new data.table From the iris Data
We use some information of the iris data to create a new data.table object in R using the following lines of code.
nr_rows <- 5 iris_subdat <- data.table("X" = iris$Sepal.Length[1:nr_rows], "Y" = iris$Petal.Width[1:nr_rows]) # Create a new data.table iris_subdat # X Y # 1: 5.1 0.2 # 2: 4.9 0.2 # 3: 4.7 0.2 # 4: 4.6 0.2 # 5: 5.0 0.2 |
nr_rows <- 5 iris_subdat <- data.table("X" = iris$Sepal.Length[1:nr_rows], "Y" = iris$Petal.Width[1:nr_rows]) # Create a new data.table iris_subdat # X Y # 1: 5.1 0.2 # 2: 4.9 0.2 # 3: 4.7 0.2 # 4: 4.6 0.2 # 5: 5.0 0.2
The above lines have created a data.table with columns X and Y. The columns are filled by the first 5 values of the iris data columns Sepal.Length and Sepal.Width.
Example 2: Create a New data.table With Different Column Types
In this example, we make a new data.table from scratch using the following code.
new_dt <- data.table("V_num" = 1:7, "V_string" = month.name[1:7], "V_log" = c(TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE)) new_dt # V_num V_string V_log # 1: 1 January TRUE # 2: 2 February FALSE # 3: 3 March TRUE # 4: 4 April FALSE # 5: 5 May FALSE # 6: 6 June FALSE # 7: 7 July TRUE |
new_dt <- data.table("V_num" = 1:7, "V_string" = month.name[1:7], "V_log" = c(TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE)) new_dt # V_num V_string V_log # 1: 1 January TRUE # 2: 2 February FALSE # 3: 3 March TRUE # 4: 4 April FALSE # 5: 5 May FALSE # 6: 6 June FALSE # 7: 7 July TRUE
The data.table consists of 7 rows and 3 columns. The columns are of types integer, character, and logical.
Note: This article was created in collaboration with Anna-Lena Wölwer. Anna-Lena is a researcher and programmer who creates tutorials on statistical methodology as well as on the R programming language. You may find more info about Anna-Lena and her other articles on her profile page.