When working with datasets that include missing values (NA), we sometimes need to find the first non-NA value in a column. This could be helpful when you have incomplete data, such as survey responses, sensor readings, or experimental results where participants miss specific trials. In this post, we will learn how to find the first non-NA value in data.table.
Table of Contents
- Finding the First Non-NA Value in a Single Column
- Selecting First Non-NA Value Across the Entire Dataset
- Conclusion
- Resources
Finding the First Non-NA Value in a Single Column
Here is an example dataset where we want to find the first valid (non-NA) response_time
for each participant in a psychology experiment:
library(data.table)
# Sample data representing responses with missing values (NA)
dt <- data.table(
participant = c("P1", "P1", "P1", "P2", "P2", "P2"),
trial = 1:6,
response_time = c(NA, 2.5, 3.0, 1.8, NA, 2.1)
)
Code language: R (r)
In this example data, we have data on participants’ response times across trials, and some values are missing.
Find the First Non-NA Value for Each Participant
Here is how to find the first non-NA value for each participant’s response time:
dt[!is.na(response_time), .SD[1], by = participant]
Code language: R (r)
In the code chunk above, we use !is.na(response_time)
to filter out the rows where the response_time
is NA
. This ensures we are only working with valid response times. We then use .SD[1]
to select the first non-NA response time for each participant. The by = participant
part groups the data by participant so that we get each individual’s first valid response time.

Selecting First Non-NA Value Across the Entire Dataset
What if we want to find the first non-NA value in a column without grouping by any variable? Here is how we can do that:
first_valid_response <- dt[!is.na(response_time), response_time[1]]
Code language: R (r)
In this code chunk, we again filter out the NA
values with !is.na(response_time)
. Then, we directly select the first valid response_time
value from the entire dataset using response_time[1]
. This returns the first non-NA value found in the response_time
column, regardless of the participant. If you are interested in learning more about selecting specific variables or columns in data.table, be sure to check out our previous post on selecting variables in data.table. In that post, we cover how to use various methods for selecting one or more columns, making it easier to manipulate and analyze your data.

Conclusion
In this post, we learned how to find the first non-NA value in data.table for both grouped and ungrouped data. This operation is beneficial when working with datasets that contain missing values. Additionally, it might be helpful to fill the missing values downwards from the first non-missing value in the column. Share your experiences in the comments below, and feel free to share this post with others who might benefit from it!
Resources
Here are more helpful data.table tutorials:
- data.table Count Rows by Group
- How to Filter in data.table in R
- How to Sum Multiple Columns in data.table