In this data science tutorial, you will learn how to rename a column (or multiple columns) in R using base functions and dplyr. Renaming columns in R is very easy, especially using the rename()
function. Now, renaming a column with dplyr and the rename()
function is super simple. But, of course, it is not super hard to change the column names using base R as well.
Now, there are some cases in which you need to get rid of strange column names such as “x1”, “x2”, “x3”. If we encounter data such as this, cleaning up the names of the variables in our dataframes may be required and will make the work more readable. This is very important, especially when we are working together or sharing our data with others. It is also essential that the column names have clear names if we plan to make the data open in a repository.
Table of Contents
- Outline
- Requirements
- Example Data: Reading an Excel File
- Rename Column with Base R Example 1: Using the Column Index
- Rename Column with Base R Example 2: Using the Column Name
- How to Rename all Columns with Base R Example 3
- How to Rename a Column in R with Dplyr
- How to Change Name of Columns in R with dplyr rename()
- How to Change all Column Names to Lowercase with dplyr rename_with()
- How to Change all Column Names to Uppercase with dplyr’s rename_with()
- How to Rename Column Names by Removing Punctuation
- Conclusion: Rename Columns with Base R and dplyr
- Resources
Outline
The outline of the post is as follows. First, you will learn about the requirements of this post. After you know what you need to follow in this tutorial, you will get the answer to two questions. In the section following the FAQs, we will load an example data set that we can use to practice renaming a column in R. Here, we will read an Excel file using the readxl package. When we have successfully imported data into R, we can start by changing the names of the columns. First, we will start by using a couple of techniques that can be done using base R. Second; we will work with dplyr. Specifically, in the dplyr-section, we will use the rename-family functions to change some variables’ names in the dataframe. That is, we will use the rename()
, and rename_with().
Before going on to the next section, it is worth mentioning that we can use dplyr to select and remove columns in R.
Requirements
To follow this post, you must install R and the packages readxl and dplyr. If you want to install the two packages you can use the install.packages()
function. Here is how to install readxl and dplyr: install.packages(c('dplyr', 'readxl')
.
It is worth pointing out that both these packages are part of the Tidyverse. This means that you can install them, among a bunch of other great packages, by typing install.packages('tidyverse')
.
You can rename a column in R in many ways. For example, if you want to rename the column called “A” to “B” you can use this code: names(dataframe)[names(dataframe)==”A”] <- “B”. This way, you changed the column name to “B”.
To rename a column in R, you can use the rename() function from dplyr. For example, if you want to rename the column “A” to “B” again, you can run the following code: rename(dataframe, B = A)
.
That was it; we are preparing to practice changing the column names in R. However, we need some data that we can use to practice renaming a column. The next section will import data by reading a .xlsx file.
Example Data: Reading an Excel File
Here is how we can read a .xlsx file in R with the readxl package:
library(readxl)
titanic_df <- read_excel("titanic.xlsx")
Code language: R (r)
In the code chunk above, we started by loading the library called readxl, and then we used the read_excel()
function to read titanic.xlsx file. Here are the first six rows of this dataframe:
In the next section, we will use the base functionality to learn how to rename a column in R.
Rename Column with Base R Example 1: Using the Column Index
Here is how to rename a single column with base R:
names(titanic_df)[1] <- "P_Class"
Code language: JavaScript (javascript)
In the code chunk above, we used the names()
n function to assign a new name to the first column in the dataframe. Specifically, using the names()
function, we get all the column names in the dataframe and then select the first columns using the brackets. Finally, we assigned the new column name using the <- and the character ‘P_Class’ (the new name). Note that you can rename multiple columns in the dataframe using the same method as above. Just change what you put within the brackets. For example, if you want to rename columns 1 to 5, you can put “1:5” within the brackets and a character vector with 5 column names.
In the next example, we will use the old column name instead to rename the column.
Rename Column with Base R Example 2: Using the Column Name
Here is how to rename the column by using the old name when selecting it:
names(titanic_df)[names(titanic_df) == "P_Class"] <- "PCLASS"
Code language: R (r)
In the code chunk above, we did something similar to the first method. However, here, we selected the column we previously renamed by its name. This is what we do within the brackets. Notice how we again used names and the == to select the column “P_Class”. Here is the output (new column name marked with red):
In the next example, you will learn how to rename multiple columns using base R. We will rename all columns in the dataframe.
How to Rename all Columns with Base R Example 3
Renaming all columns can be done similarly to the last example. Here is how we change all the columns in the R dataframe:
names(titanic_df) <- c("PC", "SURV", "NAM", "Gender", "Age", "SiblingsSPouses",
"ParentChildren", "Tick", "Cost", "Cab", "Embarked",
"Boat", "Body", "Home")
Code language: R (r)
Notice how we only used names()
in the code above. Here, it is worth knowing if the character vector (right of the <-) should contain as many elements as there are column names. Or else one or more columns will be named “NA”. Moreover, you need to know the order of the columns. In the next few examples, we will work with dplyr and the rename family of functions.
You might also be interested in How to use $ in R: 6 Examples – list & dataframe. Here is another method to use R to rename a column:
How to Rename a Column in R with Dplyr
Renaming a column in dplyr is quite simple. Here is how to change a column name:
titanic_df <- titanic_df %>%
rename(pc_class = PC)
Code language: R (r)
In the code chunk above, there are some new things that we work with. First, we start by importing dplyr. Second, we are changing the name in the dataframe using the rename()
function. Notice how we use the %>% operator. This is very handy because the functions we use after this will be applied to the dataframe to the operator’s left. Third, we use the rename()
function with one argument: the column we want to rename. For a blog post on another handy operator in R:
Remember, we renamed all of the columns in the previous example. In the code chunk above, we are changing the column back again. That is, to the left of =, we have the new column name, and to the right, the old name. As you will see in the next example, we can rename multiple columns in the dataframe by adding arguments.
It may be worth mentioning that we can use dplyr to rename factor levels in R, and to add a column to a dataframe. In the next section, however, we will rename columns in R with dplyr.
How to Change Name of Columns in R with dplyr rename()
If we, on the other hand, want to change the name of multiple columns, we can do it as follows:
titanic_df <- titanic_df %>%
rename(Survival = SURV,
Name = NAM,
Sibsp = SiblingsSPouses)
Code language: R (r)
Changing the name in multiple columns using dplyr’s rename()
function was simple. As you can see, in the code chunk above, we just added each column whose name we wanted to change. Again, the name to the right of the equal sign is the old column name. Here are the first six columns and rows of the dataframe with new column names marked in red:
In the following sections, we will work with the rename_with()
function. This is a great function that enables us to, as you will see, change the column names to upper or lower case.
How to Change all Column Names to Lowercase with dplyr rename_with()
Here is how we can use the rename_with()
function (dplyr) to change all the column names to lowercase:
titanic_df <- titanic_df %>%
rename_with(tolower)
Code language: R (r)
In the code chunk above, we used the rename_with()
function, and then the tolower()
function. This function was applied to all the column names, and the resulting dataframe look like this:
In the next example, we are going to change the column names to uppercase using the rename_with()
function together with the toupper()
function.
How to Change all Column Names to Uppercase with dplyr’s rename_with()
In this section, we will change the function that we use as the only argument in rename_with()
. This will enable us to change all the column names to uppercase:
titanic_df <- titanic_df %>%
rename_with(toupper)
Code language: R (r)
Here are the first six rows where all the column names are now in uppercase:
In the next section, we will continue working with the rename_with() function and see how to use other functions to clean the column names from unwanted characters. For example, we can use the gsub() function to remove punctuation from column names.
How to Rename Column Names by Removing Punctuation
Sometimes, our column names may contain characters we do not need. Here is how to use rename_with()
from dplyr together with gsub()
to remove punctuation from all the column names in the R dataframe:
titanic_df <- titanic_df %>%
rename_with(~ gsub('[[:punct:]]', '', .x))
Code language: JavaScript (javascript)
Notice how we added the tilde sign (~) before the gsub()
function. Moreover, the first argument is the regular expression for punctuation, and the second is what we want to remove. In our case here, we remove it from the column names. However, we could add an underscore (“_”) to replace the punctuation in the column names. Finally, if we wanted to replace specific characters, we could add them instead of the regular expression for punctuation.
Now that you have renamed the columns needing a better and clearer name, you can continue your data pre-processing. For example, you can add a column to the dataframe based on other columns with dplyr, calculate descriptive statistics (also with dplyr), take the absolute value in your R dataframe, or remove duplicate rows or columns in the dataframe.
Conclusion: Rename Columns with Base R and dplyr
This tutorial taught you how to use base R and dplyr. First, you learned how to use the base functions to change the column name of a single column based on its index and name. Second, you learned to do the same with dplyr and the rename function. Here, we also renamed multiple columns and removed punctuation from the column names. I hope you found the post helpful. If you did, please share it on your social media accounts and link to it in your projects. Finally, if you have any corrections or suggestions on the particular post, both on this post and, in general, what should be covered on this blog, please let me know.
Resources
Here are some good R tutorials:
- Convert Multiple Columns to Numeric in R with dplyr
- Not in R: Elevating Data Filtering & Selection Skills with dplyr
- How to Sum Rows in R: Master Summing Specific Rows with dplyr
- Sum Across Columns in R – dplyr & base
Dear Erik,
I have checked your article https://www.marsja.se/how-to-rename-column-or-columns-in-r-with-dplyr/ . I am trying to use the rename_with function in RStudio, but I always receive the following error:
” Error in rename_with(., toupper) : could not find function “rename_with” “.
The rename function works perfectly, but for some reason the rename_with function is not found. Do you know why? Could you help me please?
Best regards,
Lito.
Dear Lito,
Thanks for your comment. If you’re following the example in the image, in the beginning of the post, you need to install and load the dplyr library. Then it should work. I should update that image to incude
library(dplyr)
as well. If you have installed, and loaded dplyr, I am not sure if I can help you. I recently got a new computer and installed R, RStudio, and the Tidyverse packages etc. and the I have dplyr version 1.0.7 installed.Best,
Erik
Thanks man! Very good examples on how to use R to rename columns in my dataset using the dplyr package. It Wil come in handy when I will do my analysis later.
Best regards,
Peter
Hello, thank you for your comment. I am glad you found the post helpful,
Erik