In this blog post, you will learn how to use data.table to sum multiple columns in R. The data.table
package is efficient for working with large datasets, and summing multiple columns is a common data manipulation task. We have previously looked at how to sum columns using dplyr
and base R functions. Now, we will focus on how to achieve the same result using data.table
, including summing by groups and using the %in%
operator.
How to Sum Multiple Columns in data.table
To sum multiple columns in data.table
, we use the .SDcols
argument to specify which columns to sum. Here is a basic example:
library(data.table)
# Create a sample data.table
dt <- data.table(id = 1:5, val1 = c(10, 20, 30, 40, 50), val2 = c(5, 15, 25, 35, 45))
# Sum multiple columns
dt[, sum_val := rowSums(.SD), .SDcols = c("val1", "val2")]
Code language: HTML, XML (xml)
In the code chunk above, we created a data.table
with two numerical columns (val1
and val2
). We then used rowSums(.SD)
to sum these columns row-wise, storing the result in a new column called sum_val
. The .SDcols
argument specifies which columns to include in the sum.

Sum Multiple Columns in data.table by Group
Often, we need to sum multiple columns within groups. We can achieve this using the by
argument in data.table
.
# Create a sample data.table with groups
dt <- data.table(group = c("A", "A", "B", "B", "B"), val1 = c(10, 20, 30, 40, 50), val2 = c(5, 15, 25, 35, 45))
# Sum multiple columns by group
dt[, .(sum_val1 = sum(val1), sum_val2 = sum(val2)), by = group]
Code language: PHP (php)
In the code chunk above, we grouped the data.table
by the group
column and summed val1
and val2
separately for each group. This is useful when working with categorized data.

Using %in%
to Select Columns for Summation
Another way to dynamically select columns for summation is by using the %in%
operator.
# Define columns to sum
cols_to_sum <- c("val1", "val2")
# Sum using %in%
dt[, sum_val := rowSums(.SD), .SDcols = names(dt) %in% cols_to_sum]
Code language: PHP (php)
In the code chunk above, we created a vector of column names (cols_to_sum
) that we wanted to sum. The .SDcols = names(dt) %in% cols_to_sum
dynamically selects only those columns present in cols_to_sum
. See a previous post for more about selecting columns from a data.table.
Summary
Using data.table
, we can efficiently sum multiple columns, both row-wise and by groups. The .SDcols
argument allows us to specify the columns to sum, and using by
enables group-wise operations. Additionally, the %in%
operator provides a dynamic way to select columns for summation. These techniques are particularly useful when working with large datasets, where performance matters.
I would love to hear your thoughts! If you have any questions or alternative approaches, feel free to comment below. Also, if you found this post helpful, consider sharing it on social media so others can benefit too!