In this tutorial, we will explore how to use Python to read sas7bdat files with ease, thanks to the powerful pandas read_sas function. If you have ever grappled with the intricacies of SAS files and wondered how to integrate them into your Python data analysis workflow seamlessly, you are in the right place. This post will walk you through the step-by-step process of reading and working with SAS datasets using Python, enabling you to harness the full potential of your data analysis capabilities.
As previously described (in the read .sav files in Python post) Python is a general-purpose language that also can be used for doing data analysis and data visualization.
One potential downside, however, is that Python is not really user-friendly for data storage. This has, of course, led to our data many times being stored using Excel, SPSS, SAS, or similar software. See, for instance, the posts about reading .sav, .dta, and .xlxs files in Python:
- How to read and write SPSS files in Python
- How to read and write Stata files in Python
- How to read and write Excel files in Pandas
Table of Contents
- Can I Open an SAS File in Python?
- How to install Pyreadstat:
- How to Open a SAS File (.sas7bdat) File in Python
- How to Read a SAS file with Python Using Pandas
- How to Save a SAS file to CSV
- Summary: Read SAS Files using Python
- Resources
Can I Open an SAS File in Python?
Now we may want to answer how to open an SAS file in Python. In Python, there are two useful packages Pyreadstat, and Pandas, that enable us to open SAS files. If we work with Pandas, the read_sas method will load a .sav file into a Pandas dataframe. Note, Pyreadstat which is dependent on Pandas, will also create a Pandas dataframe from a .sas file.
How to install Pyreadstat:
Pyreadstat can be installed either using pip or conda:
- Install Pyreadstat using pip:
Open up a terminal, or Windows PowerShell, and typepip install pyreadstat
- Install using Conda:
Open up a terminal, or Windows PowerShell, and typeconda install -c conda-forge pyreadstat
Now, sometimes, when we install Python packages with pip we may notice that we don’t have the most recent version of pip. If this is the case, we can update pip easily, using pip, or conda. In the next section, we are going to learn how to load a SAS file in Python using the Python package Pyreadstat.
How to Open a SAS File (.sas7bdat) File in Python
In this section, we are going to use pyreadstat to import data into a Pandas dataframe. Data used in this tutorial can be downloaded (download airline.sas7bdat).
Step 1: Import Pyreadstat
First, we import pyreadstat:
import pyreadstat
Code language: JavaScript (javascript)
Steap 2: Reading the SAS File:
Here’s how to open SAS files in Python with read_sas7bdat:
# Read the sas7bdat file
df, meta = pyreadstat.read_sas7bdat('airline.sas7bdat')
Code language: Python (python)
Note that when we load a file using the Pyreadstat package, recognize that it will look for the file in Python’s working directory. In the code chunk above we create two variables; df, and meta. As can be seen when using type the variable “df” is a Pandas dataframe:
type(df)
Code language: Python (python)
Thus, we can use all methods available for Pandas dataframe objects. In the next line of code, we are going to print the five first rows of the dataframe using pandas head method.
df.head()
Code language: Python (python)
See more about working with Pandas dataframes in the following tutorials:
- Python Groupby Tutorial: Here you will learn about working the groupby method to group Pandas dataframes.
- Learn how to take random samples from a pandas dataframe
- A more general, overview, of how to work with Pandas dataframe objects can be found in the Pandas Dataframe tutorial.
How to Read a SAS file with Python Using Pandas
In this section, we are going to load the same .sav7bdat file into a Pandas dataframe but by using Pandas read_sas method, instead. This has the advantage that we can load the SAS file from a URL.
Step 1: Import Pandas
Before we continue, we need to import Pandas:
import pandas as pd
Code language: Python (python)
Now, when we have done that, we can read the .sas7bdat file into a Pandas dataframe using the read_sas method. In the read SAS example, we are importing the same data file as in the previous example.
Step 2: Open the SAS File with the read_sas Method
Here is how to read an SAS file in Python with Pandas read_sas method:
url = 'http://www.principlesofeconometrics.com/sas/airline.sas7bdat'
df = pd.read_sas(url)
df.tail()
Code language: Python (python)
In some cases, you may want to work with your data using other Python packages. Luckily, you can e.g. convert the Pandas dataframe to a NumPy array.
How to Read a SAS File and Specific Columns
Note that read_sas7bdat (Pyreadstat) have the argument “usecols”. By using this argument, we can also select which columns we want to load from the SAS file to the dataframe:
cols = ['YEAR', 'Y', 'W']
df, meta = pyreadstat.read_sas7bdat('airline.sas7bdat', usecols=cols)
df.head()
Code language: Python (python)
How to Save a SAS file to CSV
In this section of the Pandas SAS tutorial, we are going to export the .sas7bdat file to a .csv file. This is easily done, we have to use the to_csv method from the dataframe object we created earlier:
df.to_csv('data_from_sas.csv', index=False)
Code language: Python (python)
Remember to put the right path, as the second argument, when using to_csv to save a .sas7bdat file as CSV.
Summary: Read SAS Files using Python
Now we have learned how to read and write SAS files in Python. It was quite simple and both methods are, in fact, using the same Python packages.
Resources
- How to Add a Column to a Dataframe in R with tibble & dplyr
- How to Add an Empty Column to a Dataframe in R (with tibble)
- Modulo in R: Practical Example using the %% Operator
- Binning in R: Create Bins of Continuous Variables