This Pandas tutorial will show you, by examples, how to use Pandas read_csv() method to import data from .csv files. In the first section, we will go through how to read a CSV file, how to read specific columns from a CSV, how to read multiple CSV files and combine them to one dataframe. Finally, we will also learn how to convert data according to specific datatypes (e.g., using Pandas read_csv dtypes parameter).
In the last section, we will continue by learning how to use Pandas to write CSV files. That is, we will learn how to export dataframes to CSV files.
Table of Contents
- Pandas Import CSV from the Harddrive
- Pandas Read CSV from a URL Examples
- Pandas Read CSV usecols
- Pandas Import CSV files and Remove Unnamed Column
- Pandas Read CSV and Missing Values
- Reading a CSV file and Skipping Rows
- How to Read Certain Rows using Pandas
- Pandas read_csv dtype
- Load Multiple Files to a Dataframe
- How to Write CSV files in Pandas
- Conclusion
Pandas Import CSV from the Harddrive
In the first example, of this Pandas tutorial, we will just use read_csv to load CSV files, to dataframe, that are in the same directory as the script. If we have the file in another directory we have to remember to add the full path to the file.
Naturally, Pandas can be used to import data from a range of different file types. For instance, Excel (xlsx), and JSON files can be read into Pandas dataframes. Learn more about importing data in Pandas:
How to Read CSV File in Python Pandas
In this section, we are going to learn how to read CSV files from the harddrive. First, however, we are going to answer the question How do I open a CSV file in pandas?
Here are two simple steps to learn how to read a CSV file in Pandas:
1) Import the Pandas package:
import pandas as pd
2)Use the pd.read_csv() method:df = pd.read_csv('yourCSVfile.csv')
Note, the first parameter should be the file path to your CSV file.
In this tutorial, we will learn how to work with comma-separated (CSV) files in Python and Pandas. We will get an overview of how to use Pandas to load CSV to dataframes and how to write dataframes to CSV.
df = pd.read_csv('amis.csv')
df.head()
Code language: Python (python)
If we are interested in reading files, in general, using Python we can use the open() method. This way, we can read many file formats (e.g., .txt) in Python.
Additionally, we can use the index_col argument to make a column index in the Pandas dataframe. Finally, the data can be downloaded here but in the following examples, we are going to use Pandas read_csv to load data from a URL.
Pandas Read CSV from a URL Examples
Can we import a CSV file from a URL using Pandas? Yes, and in this section, we are going to learn how to read a CSV file in Python using Pandas, just like in the previous example. However, in the next read_csv example, we are going to read the same dataset but this time from a URL. It’s very simple we just put the URL in as the first parameter. Here areto three simple steps that will help us read a CSV from a URL:
- Again, we need to import Pandas
- Create a string variable with the URL
- Now use Pandas read_csv together with the URL (see example below)
Example 1: Read CSV from a URL
In the example code below, we follow the three easy steps to import a CSV file into a Pandas dataframe:
import pandas as pd
# String with URL:
url_csv = 'https://vincentarelbundock.github.io/Rdatasets/csv/boot/amis.csv'
# First example to read csv from URL
df = pd.read_csv(url_csv)
df.head()
Code language: Python (python)
In the image above, we can see that we get a column named ‘Unnamed: 0’. Furthermore, we can see that it contains numbers. Thus, when using Pandas, we can use this column as the index column. In the next code example, we are doing exactly this; we are going to use Pandas read_csv and the index_col parameter.
Example 2: Read CSV from a URL with index_col
This parameter can take an integer or a sequence. In our case, we are going to use the integer 0 and we will get a nicer dataframe:
url_csv = 'https://vincentarelbundock.github.io/Rdatasets/csv/boot/amis.csv'
# Pandas Read CSV from URL Example:
df = pd.read_csv(url_csv, index_col=0)
df.head()
Code language: Python (python)
The index_col parameter also can take a string as input and we will now use a different datafile. In the next example we will read a CSV into a Pandas dataframe and use the idNum column as index.
csv_url = 'http://vincentarelbundock.github.io/Rdatasets/csv/carData/MplsStops.csv'
df = pd.read_csv(csv_url, index_col='idNum')
df.iloc[:, 0:6].head()
Code language: Python (python)
Note, to get the above output we used Pandas iloc to select the first 7 rows. This was done to get an output that could be easier illustrated. That said, we are now continuing to the next section where we are going to read certain columns to a dataframe from a CSV file.
A final note before going futher with reading CSV files: It is also possible to use Pandas to read the iex cloud api with Python to import stock data.
Pandas Read CSV usecols
In some cases, we don’t want to parse every column in the CSV file. To only read certain columns we can use the parameter usecols. Note, if we want the first column to be the index column and we want to parse the three first columns we need to have a list with 4 elements (compare my read_excel usecols example here).
Pandas Read CSV Example: Specifying Columns to Import
Here’s an example when we use Pandas read_csv() and only read the three first columns:
cols = [0, 1, 2, 3]
df = pd.read_csv(url_csv,
index_col=0, usecols=cols)
df.head()
Code language: Python (python)
Note, we actually did read 4 columns but set the first column as the index column. Of course, using read_csv usecols make more sense if we had a CSV file with more columns. We can use usecols with a list of strings, as well. In the next example, we return to the larger file we used previously. Here’s how to use the column names in the datafile:
csv_url = 'http://vincentarelbundock.github.io/Rdatasets/csv/carData/MplsStops.csv'
df = pd.read_csv(csv_url, index_col='idNum',
usecols=['idNum', 'date', 'problem', 'MDC'])
df.head()
Code language: Python (python)
Pandas Import CSV files and Remove Unnamed Column
In some of the previous read_csv example, we get an unnamed column. In previous sections, of this Pandas read CSV tutorial, we have solved this by setting this column as the index columns, or used usecols to select specific columns from the CSV file. However, we may not want to do that for some reason. Here’s one example of how to use pd.read_csv to get rid of the column “Unnamed:0”:
csv_url = 'http://vincentarelbundock.github.io/Rdatasets/csv/carData/MplsStops.csv'
cols = pd.read_csv(csv_url, nrows=1).columns
df = pd.read_csv(csv_url, usecols=cols[1:])
df.iloc[:, 0:6].head()
Code language: Python (python)
How to Drop a Column from Pandas dataframe
It’s of course also possible to remove the unnamed columns after we have loaded the CSV to a dataframe. To remove the unnamed columns we can use two different methods; loc and drop, together with other Pandas dataframe methods. When using the drop method we can use the inplace parameter and get a dataframe without unnamed columns.
df.drop(df.columns[df.columns.str.contains('unnamed', case=False)],
axis=1, inplace=True)
# The following line will give us the same result as the line above
# df = df.loc[:, ~df.columns.str.contains('unnamed', case=False)]
df.iloc[:, 0:7].head()
Code language: PHP (php)
To explain the code example above; we select the columns without columns containing the string ‘unnamed’. Furthermore, we used the case parameter so that the contains method is not case-sensitive. Thus, we will get columns named “Unnamed” and “unnamed”. In the first row, using Pandas drop, we are also using the inplace parameter so that it changes our dataframe. The axis parameter, however, is used to drop columns instead of indices (i.e., rows).
Pandas Read CSV and Missing Values
In the next Pandas read .csv example, we will learn how to handle missing values in a Pandas dataframe. If we have missing data in our CSV file and it’s coded in a way that makes it impossible for Pandas to find them we can use the parameter na_values. In the example below, the amis.csv file has been changed and there are some cells with the string “Not Available”.
That is, we are going to change “Not Available” to something that we easily can remove when carrying out data analysis later.
df = pd.read_csv('Simdata/MissingData.csv', index_col=0,
na_values="Not Available")
df.head()
Code language: Python (python)
Reading a CSV file and Skipping Rows
What if our data file(s) contain information on the first x rows and we need to skip rows when using Pandas read_csv? For instance, how can we skip the first three rows in a file looking like this:
We will now learn how to use Pandas read_csv and skip x amount of rows. Luckily, it’s very simple we just use the skiprows parameter. In the following example, we are setting skiprows to 3 to skip the first 3 rows.
Pandas read_csv skiprows example:
How do we use Pandas skiprow parameter? Here’s a Pandas read_csv example, where we skip the three first rows:
df = pd.read_csv('Simdata/skiprow.csv', index_col=0, skiprows=3)
df.head()
Code language: Python (python)
Note we can obtain the same result as above using the header parameter (i.e., data = pd.read_csv(‘Simdata/skiprow.csv’, header=3)).
How to Read Certain Rows using Pandas
Can we read specific rows from a CSV file using Pandas read_csv method? If we don’t want to read every row in the CSV file we ca use the parameter nrows. In the next example, below we read the first 8 rows of a CSV file.
df = pd.read_csv(url_csv, nrows=8)
df
Code language: Python (python)
If we want to select random rows we can load the complete CSV file and use Pandas sample to randomly select rows (learn more about this by reading the Pandas Sample tutorial).
Pandas read_csv dtype
We can also set the data types for the columns. Although, in the amis dataset all columns contain integers we can set some of them to string data type. This is exactly what we will do in the next Pandas read_csv pandas example. We will use the dtype parameter and put in a dictionary:
url_csv = 'https://vincentarelbundock.github.io/Rdatasets/csv/boot/amis.csv'
df = pd.read_csv(url_csv, dtype={'speed':int, 'period':str, 'warning':str, 'pair':int})
df.info()
Code language: Python (python)
It’s, of course, possible to force other datatypes such as integer and float. All we have to do is change str to float, for instance (given that we have decimal numbers in that column, of course).
Load Multiple Files to a Dataframe
If we have data from many sources such as experiment participants we may have them in multiple CSV files. If the data, from the different CSV files, are going to be analyzed together we may want to load them all into one dataframe. In the next examples, we are going to use Pandas read_csv to read multiple files.
Example 1: Reading Multiple CSV Files using os fnmatch
First, we are going to use Python os and fnmatch to list all files with the word “Day” of the file type CSV in the directory “SimData”. Next, we are using Python list comprehension to load the CSV files into dataframes (stored in a list, see the type(dfs) output).
import os, fnmatch
csv_files = fnmatch.filter(os.listdir('./SimData'), '*Day*.csv')
dfs = [pd.read_csv('SimData/' + os.sep + csv_file)
for csv_file in csv_files]
type(dfs)
# Output: list
Code language: Python (python)
Finally, we use the method concat to concatenate the dataframes in our list. In the example files, there is a column called ‘Day’ so that each day (i.e., CSV file) is unique.
df = pd.concat(dfs, sort=False) df.Day.unique()Example 2: Reading Multiple CSV Files using glob
The second method we are going to use is a bit simpler; using Python glob. If we compare the two methods (os + fnmatch vs. glob) we can see that in the list comprehension we don’t have to put the path. This is because glob will have the full path to our files. Handy!
import glob
csv_files = glob.glob('SimData/*Day*.csv')
dfs = []
for csv_file in csv_files:
temp_df = pd.read_csv(csv_file)
temp_df['DataF'] = csv_file.split('\\')[1]
dfs.append(temp_df)
Code language: Python (python)
If we don’t have a column, in each CSV file, identifying which dataset it is (e.g., data from different days) we could apply the filename in a new column of each dataframe:
import glob
csv_files = glob.glob('SimData/*Day*.csv')
dfs = []
for csv_file in csv_files:
temp_df = pd.read_csv(csv_file)
temp_df['DataF'] = csv_file.split('\\')[1]
dfs.append(temp_df)
Code language: Python (python)
- Check the Pandas Dataframe Tutorial for Beginners
There are, of course, times when we need to rename multiple files (e.g., CSV files before loading them into Pandas dataframes). Luckily, to rename a file in Python we can use os.rename(). This method can be used regardless if we need to rename CSV or .txt files.
Now we know how to import multiple CSV files and, in the next section, we will learn how to use Pandas to write to a CSV file.
How to Write CSV files in Pandas
In this section, we will learn how to export dataframes to CSV files. We will start by creating a dataframe with some variables but first, we start by importing the modules Pandas:
import pandas as pd
Code language: Python (python)
Before we go on and learn how to use Pandas to write a CSV file, we will create a dataframe. We will create the dataframe using a dictionary. The keys will be the column names and the values will be lists containing our data:
df = pd.DataFrame({'Names':['Andreas', 'George', 'Steve',
'Sarah', 'Joanna', 'Hanna'],
'Age':[21, 22, 20, 19, 18, 23]})
df.head()
Code language: Python (python)
Saving Pandas Dataframe to CSV
Now we are ready to learn how to save Pandas dataframe to CSV. It’s quite simple, we write the dataframe to CSV file using Pandas to_csv method. In the example below we don’t use any parameters but the path_or_buf which is, in our case, the file name.
df.to_csv('NamesAndAges.csv')
Code language: Python (python)
Here’s how the exported dataframe look like:
As can be seen in the image above we get a new column when we are not using any parameters. This column is the index column from our Pandas dataframe. When working with Pandas to_csv, we can use the parameter index and set it to False to get rid of this column.
df.to_csv('NamesAndAges.csv', index=False)
Code language: PHP (php)
How to Write Multiple Dataframes to one CSV file
If we have many dataframes and we want to export them all to the same CSV file it is, of course, possible. In the Pandas to_csv example below we have 3 dataframes. We are going to use Pandas concat with the parameters keys and names.
This is done to create two new columns, named Group and Row Num. The important part is Group which will identify the different dataframes. In the last row of the code example we use Pandas to_csv to write the dataframes to CSV.
df1 = pd.DataFrame({'Names': ['Andreas', 'George', 'Steve',
'Sarah', 'Joanna', 'Hanna'],
'Age':[21, 22, 20, 19, 18, 23]})
df2 = pd.DataFrame({'Names': ['Pete', 'Jordan', 'Gustaf',
'Sophie', 'Sally', 'Simone'],
'Age':[22, 21, 19, 19, 29, 21]})
df3 = pd.DataFrame({'Names': ['Ulrich', 'Donald', 'Jon',
'Jessica', 'Elisabeth', 'Diana'],
'Age':[21, 21, 20, 19, 19, 22]})
df = pd.concat([df1, df2, df3], keys =['Group1', 'Group2', 'Group3'],
names=['Group', 'Row Num']).reset_index()
df.to_csv('MultipleDfs.csv', index=False)
Code language: Python (python)
In the CSV file we get 4 columns. The keys parameter with the list ([‘Group1’, ‘Group2’, ‘Group3’]) will enable identification of the different dataframes we wrote. We also get the column “Row Num” which will contain the row numbers for each dataframe:
Conclusion
In this tutorial we have learned about importing CSV files into Pandas dataframe. More specifically, we have learned how to:
- Load CSV files to dataframe using Pandas read_csv
- locally
- from the WEB
- Read certain columns
- Remove unnamed columns
- Handle missing values
- Skipping rows and reading certain rows
- Changing datatypes using dtypes
- Reading many CSV files
- Saving dataframes to CSV using Pandas to_csv