In this blog post, we will explore an important skill in data manipulation: how to efficiently remove a column in R. Whether you are analyzing psychological data, studying the effects of noise exposure in hearing science, or working with large datasets in data science, there are often scenarios where you need to remove unnecessary variables. By mastering the technique of removing columns, you can streamline your data analysis process, enhance data clarity, and improve computational efficiency.
When working with psychological data, you may have collected numerous variables during your study, including demographic information, survey responses, and experimental measures. To focus on specific research questions or simplify your analysis, removing irrelevant columns is crucial. Similarly, in the field of hearing science, you may gather a wide range of auditory measurements, but only a subset might be relevant to your investigation. By removing unnecessary columns, you can narrow down your data to the key variables of interest.
In data science, working with large datasets is quite common. Large datasets often contain numerous columns, and handling unnecessary or redundant variables can significantly impact computational efficiency. Removing extraneous columns helps reduce memory usage and accelerates data processing, allowing you to concentrate on the essential aspects of your analysis.
In the following sections, we will learn various techniques we can use to remove columns in R, including using base R functions and leveraging popular packages like dplyr. We will explore practical examples and provide step-by-step instructions to equip you with the necessary skills to confidently remove columns from your datasets. So, let us dive in and master the art of removing columns in R!
Table of Contents
- Outline
- Prerequisites
- Example Data
- How to Remove a Column by Name in R using dplyr
- How to Remove a Column by Index in R using dplyr
- How to Remove the Last Column in R
- How to Delete Columns by Names in R using dplyr
- Remove Columns by Index in R using select()
- How to Drop Columns Starting with using the starts_with() function
- Removing Columns in R Starting with a Specific Letter
- Dropping a Column ending With a Character using the ends_with() function
- How to Remove Columns Ending with a Word in R
- Deleting a Column from an R dataframe using the contains() function
- Final Example on How to Remove a Column in R
- Conclusion: Dropping Columns from Dataframe in R
- R Resources
Outline
In this blog post, we will explore the process of removing columns in R. We will cover the prerequisites, such as installing R-packages like dplyr from Tidyverse. Then, we will dive into different methods to delete columns, including removing columns by name or index, deleting the first or last column, and selectively removing columns based on specific patterns or criteria. We will provide practical examples and step-by-step instructions to guide you through each method. Additionally, we will demonstrate advanced techniques such as dropping columns based on patterns like starting with a letter, ending with a character, or containing a specific word. By the end of this post, you will have a comprehensive understanding of how to effectively remove columns in R, enabling you to streamline your data manipulation workflows and focus on the variables relevant to your analysis.
Prerequisites
To follow this R tutorial on how to delete columns in R, some basic knowledge of how to use R is needed. Furthermore, we must have R and dplyr (or Tidyverse) installed. Ensure the latest version of R is installed (it can be downloaded here).
Installing R-packages (i.e., dplyr or Tidyverse)
Installing R-packages is quite easy, to install dplyr we can type install.packages(“dplyr”)
. If we, on the other hand and want to install the whole Tidyverse package we type install.packages(“tidyverse”)
.
Note that dplyr can also be used to rename columns in R, among other things. Read on for more examples of what the Tidyverse packages can do (or learn how to drop variables by name and index!)
Using R-packages
In this section, of the remove column in R tutorial, we are going to learn how to load an r-package. Loading R-packages is quite easy; we just type library(dplyr)
or library(tidyverse)
, if we want to load dplyr, or the entire Tydiverse package, respectively.
It may be worth mentioning here that the Tidyverse package comes with a range of different good packages that can be used for other things. That is, even though this tutorial is focused on how to use dplyr to remove columns. For example, you can use some of the packages to create dummy variables in R, extract year from datetime, extract day from datetime, and extract time from datetime.
Deleting a column using dplyr is very easy using the select()
function and the -
sign. For example, if you want to remove the columns “X” and “Y” you’d do like this: select(Your_Dataframe, -c(X, Y))
. Note, in that example, you removed multiple columns (i.e. 2) but to remove a column by name in R, you can also use dplyr, and you’d just type: select(Your_Dataframe, -X)
. Finally, if you want to delete a column by index, with dplyr and select, you change the name (e.g. “X”) to the index of the column: select(Your_DF -1)
.
The simplest way to delete the first column in R is to use the brackets ([]) and assign NULL
to the first column (put “1” between the brackets!). It is also very easy to remove the first column using dplyr’s select()
function. Just add your dataframe as the first parameter and the number 1 as the second with a minus sign in front of it (i.e “-1”).
Example Data
Now, before we start to use dplyr to remove columns, we need to load some data that we can practice deleting columns from. In this tutorial, we are going to start to drop columns from the Starwars data set that is available in the dplyr package:
# Loading Example Data:
data("starwars", package = "dplyr")
# Checking the first 5 rows of the dataset:
head(starwars)
Code language: R (r)
Data can, of course, be imported from different formats. In fact, when working with real data it will, of course, not be stored in R. Learn more about importing data in R in the following tutorials:
Now that we have some example data we can go to the next section where we start to clean the dataframe from variables that we don’t really need. In the next section, we will use dplyr to remove a column by its name.
How to Remove a Column by Name in R using dplyr
In the first example, we will drop one column by its name. Deleting a column by the column name is quite easy using dplyr and select. First, we are going to use the select() function and we will use the name of the dataframe from which we want to delete a column as the first argument. Here’s how to remove a column in R with the select()
function:
# Dplyr remove a column by name:
select(starwars, -height)
Code language: R (r)
As you can see, we used the name of the column (i.e, “height”) as the second argument. Here we used the “-” to tell the select()
function that this is the column we want to drop from the dataframe. Note, if you want the column to stay removed from the dataframe, you have to assign the dataframe. In the next example, we will drop a column by its index.
- How to Transpose a Dataframe or Matrix in R with the t() Function
- How to use %in% in R: 7 Example Uses of the Operator
How to Remove a Column by Index in R using dplyr
In the second example, we will drop one column by its index. This is also very easy and we are going to use dplyr and select again. Here’s how to remove a column in R if we know the index for that column:
# Dplyr remove column by index:
select(starwars, -1)
Code language: R (r)
Notice, how we this time removed the first column from the dataframe in R. That is, we did not delete the same column as in the example when we removed the column by name. Again, the “-” sign means we want to drop the variable at this index (i.e, 1). In the next section, we will go on and see that the same general idea, that we have learned here, can be used to remove multiple columns with dplyr (i.e., with the select() function). Note sometimes you have to clean your data in more ways. For example, you can also use R to remove duplicate rows and columns.
How to Remove the Last Column in R
Here’s how we can use select()
and the helper function last_col()
to delete the last column in R:
select(starwars, -last_col())
That was pretty simple, right? All we did was add the function and the minus sign as the second parameter, and we deleted the last column.
How to Delete Columns by Names in R using dplyr
In this section, we are going to delete many columns in R. First, we are going to delete multiple columns from a dataframe by their names. To drop many columns, by their names, we just use the c()
function to define a vector. In this vector, we are going to add each of the names of the columns we want to remove. Here is how to use dplyr to remove columns by name:
# Dplyr remove multiple columns by name:
select(starwars, -c(name, height, mass))
Code language: R (r)
Notice, again, that we used the “-” to remove the columns from the dataframe, much like when we removed one column by name in R. Remember, if you want the change to the dataframe to be permanent, you will have to assign the dataframe to a variable. Note, that we have removed variables (columns) now but we can, of course, also insert new variables. For example, with tibble we can add empty columns to the dataframe in R.
Remove Columns by Index in R using select()
In the second example on how to remove multiple columns, we are going to drop the columns from dataframe, in R, by indexes. Again, we use the c()
function and put in the indexes we want to remove from the dataframe.
# delete multiple columns by index using dplyr:
select(starwars, -c(1, 2, 3))
Code language: R (r)
Note, the above code example drops the 1st, 2nd, and 3rd columns from the R dataframe. That is, the same columns we deleted using the variable names, in the previous section of the remove variables from a dataframe in R tutorial. If we want to delete the 3rd, 4th, and 6th columns, for instance, we can change it to -c(3, 4, 6)
. Furthermore, you can use both : and seq() to create a sequence of numbers in R. This means that if you want to remove many columns by their indexes you can generate the indexes. For example, if we wanted to use dplyr to remove columns 1 to 6 we can use the following code:
select(starwars, -c(1:6))
# Alternative:
# select(starwars, -seq(1, 6))
Code language: PHP (php)
Notice how there is one line of code commented out. This is because both of the above examples produce the same results as they, as previously mentioned, they both generate numbers in a sequence.
How to Drop Columns Starting with using the starts_with() function
In this section, we are going to use the starts_with()
function to remove a column in R. For instance, if we want to remove a column, from a dataframe, that starts with the letter “g” we use the following command:
# dplyr dropping columns starting with a letter:
select(starwars, -starts_with("f"))
Code language: R (r)
As you can see, in the image above, we removed columns starting with a specific letter. Again, as in the previous examples, we used the “-” to tell select that we don’t want the columns starting with the letter “f”.
Removing Columns in R Starting with a Specific Letter
In this example, we are going to learn how to remove columns in R starting with a specific letter. In this case, we will remove all columns that start with the letter “s”. Note, however, we could also remove all columns starting with a certain word. If our dataframe contained such variables, that is. Now, to remove columns in R starting with a letter (i.e., “s”) we just do the following:
# deleting columns starting with the letter "s":
select(starwars, -starts_with("s"))
Code language: R (r)
Dropping a Column ending With a Character using the ends_with() function
Now we will continue by removing a column from a dataframe that ends with a specific word. For instance, if we want to remove a column ending with the word “year”, we will use the ends_with()
function like this:
# Dropping columns ending with a letter:
select(starwars, -ends_with("r"))
Code language: R (r)
In the code chunk above, we removed all columns that end with the letter “r”. Here is the resulting dataframe with the deleted variables:
Now that we know how to use dplyr to a drop a column ending with a letter, we will continue and applying the same method to drop variables ending with a word.
How to Remove Columns Ending with a Word in R
Now, we will continue using the ends_with()
function. In this case, however, we may use it in a more “real world” application. If we have multiple columns, ending with a certain word, we can remove all of these columns from the R dataframe using ends_with()
. For example, if we want to remove columns in R that end with the word “color”, we do as follows:
# removing multiple columns with dplyr, ending with a word:
select(starwars, -ends_with("color"))
Code language: R (r)
Deleting a Column from an R dataframe using the contains() function
In the final example of how to remove columns from an R dataframe we are going to use the contains() function. This is handy if we want to remove all columns containing a certain word, or character. For instance, if we want to remove all columns containing the underscore (“_”) we type the following:
# dplyr remove columns containing character in name:
select(starwars, -contains("_"))
Code language: R (r)
Now that you’ve dropped columns you can go ahead and do some other data manipulation tasks. For instance, if your dataset happens to contain date, and you want to extract timestamps from datetime, you can now go ahead and do it.
Final Example on How to Remove a Column in R
Now, in this final, how to delete a column in R example, we are going to use the pipe, “%>%”, and save the dataframe as a new dataframe.
# dplyr remove columns and saving it to new dataframe:
new_df <- starwars %>%
select(-contains("_"))
head(new_df)
Code language: R (r)
As can be seen in the image above, we have removed all columns, from the R dataframe, that contained the underscore. We also created a new dataframe, called new_df, and used the head() function to print the first 5 rows.
Now that we have dropped the columns we want to we can carry on doing descriptive statistics in R and creating a scatter plot in R. Note, there may be more data manipulation that needs to be done before we do this and the next step (e.g., repeated measures ANOVA in R).
Conclusion: Dropping Columns from Dataframe in R
In conclusion, removing a column in R was pretty easy to do. In this tutorial, we have dropped one column by name and index, we have deleted multiple columns by name and indexes. Furthermore, we have removed columns in R dataframes starting with, ending with, and containing, letters, words, and characters.
Support my blog so that I can create more content that you may find useful: become a patron.
I would very much appreciate any pledge, especially if you use an adblocker.
As a final note, if we want to remove many columns we can use select without the minus sign (“-“). This will select specific columns that we may want to keep.
R Resources
Here are some tutorials on this blog that may come in handy:
- R Count the Number of Occurrences in a Column using dplyr
- Select Columns in R by Name, Index, Letters, & Certain Words with dplyr
- How to Convert a List to a Dataframe in R – dplyr
- Coefficient of Variation in R
- Correlation in R: Coefficients, Visualizations, & Matrix Analysis
- Report Correlation in APA Style using R: Text & Tables