In this tutorial, you will learn how to extract year from date in R. You will learn by a couple of examples containing R code along with a description of the code. To know how to extract the year from DateTime may be needed when working with data containing datetime (e.g., 2020-08-24 10:34:00) and you want to do e.g. time series analysis but only want to use year.
As you may already know, R is a versatile statistical programming environment, and there are many ways to extract elements, such as year, from datetime. For example, we can separate year using the format()
function. In this post, however, we need to start by converting a vector, or a column in a dataframe (e.g., to a POSIXct class) as well as use the lubridate package. It is, of course, possible to adapt the examples in this post to extract time, and to extract day.
First, before extracting year, the next section will give you information about what you need to have to follow this R tutorial.
Prerequisites
In this post, you will, obviously, have R installed. Furthermore, if you want to work with lubridate, you need to install this package as well (or the tidyverse package, as they are part of this bundle). Finally, when we are reading a CSV file, we are going to use the read_csv() function from the readr package. Installing lubridate and readr is, of course, optional but these two tidyverse packages are very handy. For instance, it enables you to easily remove a column in R, to add an empty column to the dataframe, to calculate descriptive statistics, to read data from .xlsx files, and create dummy variables. I highly reccomend installing tidyverse.
As you may already know; it is quite easy to install R packages. In the first step, you will open up R (or RStudio) and type install.packages(c("lubridate", "readr"))
. As previously mentioned, lubridate is part of the tidyverse packages and you can change lubridate and readr for “tidyverse” to install this package.
Now that you know how to install lubridate and readr, or tidyverse, you will get the answer on the question “How do I get the year from a date in R?”
To get the year from a date in R you can use the functions as.POSIXct() and format(). For example, here's how to extract the year from a date: 1) date <- as.POSIXct("02/03/2014 10:41:00", format = "%m/%d/%Y %H:%M:%S), and 2) format(date, format="%Y"). Now, you know how to use R to extract year from date. Read on for a more detailed explanation.
Note, the image above is free to use but if you do, please link back to my blog (or this blog post). In the next section, we are going to have a look at the first example of how to extract year from a vector with dates.
Example 1: Extract Year from a Vector Containing Dates
In the first example, we are going to get year from a vector (c()
) containing dates. First, however, we will have to convert the vector using the as.POSIXct()
function.
Here’s the general syntax to extract year from date in R:
format(YourDates, format = "%Y")
Code language: R (r)
Evidently, YourDates is a vector containing the dates you want to extract the year from. In the next subsection, you will find more details on how to extract year in two steps.
1) Convert a Character Vector to POSIXct Class:
In the first step we are going to convert a character vector (c()
), so that we can use format. Here’s how to use the as.POSIXct()
function:
dates <- c("01/02/2014 12:40:00", "01/03/2015 11:40:00", "01/02/2016 05:40:00",
"01/04/2017 09:44:00" , "01/02/2018 01:43:00", "01/12/2019 04:41:00")
dates
Code language: R (r)
First, we used the vector as input to the as.POSIXct()
function together with the format argument. Now, the input to the format argument ("%m/%d/%Y %H:%M:%S"
) is connected to how the dates are stored in your vector. That is, if your dates are stored in another fashion, you will need to change this. In the next step, we will get the year from this object (i.e., containing dates).
2) Extract Year from Date
In the next step, we are ready to extract year from date in R:
dates <- as.POSIXct(dates, format = "%m/%d/%Y %H:%M:%S")
format(dates, format="%Y")
Code language: JavaScript (javascript)
It is, of course, possible to skip the first step and use POSIXct() directly. Here’s a working code example:
dates <- as.POSIXct(c("01/02/2014 12:40:00", "01/03/2015 11:40:00", "01/02/2016 05:40:00",
"01/04/2017 09:44:00" , "01/02/2018 01:43:00", "01/12/2019 04:41:00"),
format = "%m/%d/%Y %H:%M:%S")
format(dates, format = "%Y")
Code language: R (r)
That was how to get year from a vector containing date. Most of the time, however, we will have read our data from a file (e.g., from an Excel file). Therefore, in the next example, we will work with a dataframe that has a column containing data. First, we will read data using GET() from the httr package. Second, we will separate year from the column containing date.
Example 2: How to Extract Year from a Column in a Dataframe
Now, most of the time we read our data from a file (e.g., .csv). Therefore, this example is concerned with importing data from a CSV file. Here’s how to extract year from a column and add it a new column:
library(httr)
library(readr)
GET('https://opendata.umea.se/explore/dataset/luftdata-vastra-esplanaden/download/?format=csv&timezone=Europe/Stockholm&lang=en&use_labels_for_header=true&csv_separator=,', write_disk(tf <- tempfile(fileext = ".csv")))
df <- read_csv(tf)
# Separate year from datetime:
df$Year <- format(df$Time, format="%Y")
# Print parts of the dataframe
head(df[4:length(names(df))])
Code language: R (r)
Let me explain the code chunk above. First, we get the .csv file from an URL using GET()
. Second, this file was stored (using write_disk()
in a temporary directory on the harddrive. Third, the temporary path to the file is found in tf
. Finally, we read the file using read_csv()
function from the readr package. There are several other methods that we can use to add a column to a dataframe in R. For example, we can also use dplyr and the mutate()
function.
Now you have used R to extract year, from a date, in a column and how to add it to a new column in the dataframe. Here’s the extracted years, in the added column “Year”, in our dataframe:
Note, if you don’t want to install tidyverse (or the readr package) you can read csv files using read.csv function. If you have your data stored in other file formats you can have a look at the following tutorials on how to import data in R:
- How to Read & Write SPSS Files in R Statistical Environment
- How to Import Data: Reading SAS Files in R
- How to Read and Write Stata (.dta) Files in R with Haven
In the next example, we are going to work with a package called lubridate. As previously mentioned, lubridate is part of the tidyverse package which contains a lot of useful packages. Another example is that you can use ggplot2 to create a scatter plot in R. That is, ggplot2 is also part of the tidyverse package.
How to use Lubridate to Extract Year from date in R
Here’s how to use R to extract year from a vector containing date using functions dmy_hms()
(lubridate) and format()
:
dates <- c("01/02/2014 12:40:00", "01/03/2015 11:40:00", "01/02/2016 05:40:00",
"01/04/2017 09:44:00" , "01/02/2018 01:43:00", "01/12/2019 04:41:00")
dates <- as.POSIXct(dates, format = "%m/%d/%Y %H:%M:%S")
# get year from date
year <- year(dates)
year
Code language: R (r)
Now, here’s how to create a dataframe and use R to separate year from date:
library(lubridate)
dates <- dmy_hms(c("01/02/2014 12:40:00", "01/03/2015 11:40:00", "01/02/2016 05:40:00",
"01/04/2017 09:44:00" , "01/02/2018 01:43:00", "01/12/2019 04:41:00"))
df_dates <- data.frame(date = format(dates, format = "%m/%d/%Y %H:%M%S") , Year= format(dates, format = "%Y"))
head(df_dates)
Code language: R (r)
In this example, we created a dataframe, and extracted the year and added it to a new column. Here’s how the first six rows of the dataframe looks like:
Note, as in the previous example, you can of course work with lubridate to extract time, day, or year from a column in a dataframe (like in example 2). If you need to, you can now go ahead and rename levels of a factor in R.
Conclusion
In this short tutorial, you have learned how to use R to extract year from date using as.POSIXct()
, format, and lubridate (i.e., year()
). First, you learned how to convert a vector containing datetime to a POSIXct class. This was done to enable the use format to extract year. Second, you have also learned how to import data, split a column containing datetime and add it to a new column, Finally, you have also learned how to do the same using lubridate and format()
as well as creating a new dataframe.
References
Here are some resources that can be useful when working with datetime objects in R.
Grolemund, G., & Wickham, H. (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3).
Mailund T. (2019) Working with Dates: lubridate. In: R Data Science Quick Reference. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4894-2_10 (Paywalled).