In this brief Python tutorial, we will learn how to read Excel (xlsx) files using Python. Specifically, we will read xlsx files in Python using the Python module openpyxl. First, we start by the simplest example of reading a xlsx file in Python. Second, we will learn how to read multiple Excel files using Python.
In previous posts, we have learned how to use Pandas read_excel method to import xlsx files with Python. As previously mentioned, however, we will use another package called openpyxl in this post. In the next paragraph, we will learn how to install openpyxl.
Table of Contents
- Openpyxl Syntax
- Prerequisites: Python and Openpyxl
- How to Read an Excel (xlsx) File in Python
- Reading an Excel (xlsx) File to a Python Dictionary
- How to Read Multiple Excel (xlsx) Files in Python
- Conclusion: Reading Excel (xlsx) Files in Python
Openpyxl Syntax
Basically, here’s the simplest form of using openpyxl for reading a xlsx file in Python:
import openpyxl
from pathlib import Path
xlsx_file = Path('SimData', 'play_data.xlsx')
wb_obj = openpyxl.load_workbook(xlsx_file)
# Read the active sheet:
sheet = wb_obj.active
Code language: Python (python)
It is, of course, also possible to learn how to read, write, and append to files in Python (e.g., text files). Make sure to check that post out, as well.
Prerequisites: Python and Openpyxl
Now, before we will learn what Openpyxl is, we need to make sure that we have both Python 3 and the module openpyxl installed. One easy way to install Python is to download a Python distribution such as Anaconda or ActivePython. Openpyxl, on the other hand, can, as with many Python packages, be installed using both pip and conda. Now, using pip we type the following in a command prompt or terminal window, pip install openpyxl
and using conda we type this; conda install openpyxl
. Note, sometimes, when we install Python packages with pip, we may notice that we don’t have the latest version of pip. Luckily, it is quite easy to upgrade pip to the latest version using pip.
Openpyxl is a Python module that reads and writes Excel (with extension xlsx/xlsm/xltx/xltm) files. Furthermore, this module enables a Python script to modify Excel files. For instance, if we want to go through thousands of rows but just read specific data points and make small changes to these points, we can do this based on some criteria with openpyxl.
Now, the general method for reading xlsx files in Python (with openpyxl) is to import openpyxl (import openpyxl) and then read the workbook: wb = openpyxl.load_workbook(PATH_TO_EXCEL_FILE)
. In this post, we will learn more about this, of course.
How to Read an Excel (xlsx) File in Python
Now, in this section, we will be reading an xlsx file in Python using openpyxl. In a previous section, we have already been familiarized with the general template (syntax) for reading an Excel file using openpyxl and we will now get into this module in more detail. Note we will also work with the Path method from the Pathlib module.
1. Import the Needed Modules
In the first step, to reading an xlsx file in Python, we need to import the modules we need. That is, we will import Path and openpyxl:
import openpyxl
from pathlib import Path
Code language: Python (python)
2. Setting the Path to the Excel (xlsx) File
In the second step, we will create a variable using Path. Furthermore, this variable will point at the location and filename of the Excel file we want to import with Python:
# Setting the path to the xlsx file:
xlsx_file = Path('SimData', 'play_data.xlsx')</code></pre>
Code language: Python (python)
Note, “SimData” is a subdirectory to that of the Python script (or notebook). That is, if we were to store the Excel file in a completely different directory, we need to put in the full path. For example, xlsx_file = Path(Path.home(), 'Documents', 'SimData', 'play_data.xlsx')
if the data is stored in the Documents in our home directory.
3. Read the Excel File (Workbook)
In the third step, we are going to use Python to read the xlsx file. Now, we are using the load_workbook() method:
wb_obj = openpyxl.load_workbook(xlsx_file)
Code language: Python (python)
4. Read the Active Sheet from the Excel file
Now, in the fourth step, we are going to read the active sheet using the active method:
sheet = wb_obj.active
Code language: Python (python)
Note, if we know the sheet name we can also use this to read the sheet we want: play_data = wb_obj['play_data']
5. Work or Manipulate the Excel Sheet
In the final and fifth steps, we can work or manipulate the Excel sheet imported with Python. For example, if we want to get the value from a specific cell, we can do as follows:
print(sheet["C2"].value)
Code language: Python (python)
Another example of what we can do with the spreadsheet in Python is we can iterate through the rows and print them:
for row in sheet.iter_rows(max_row=6):
for cell in row:
print(cell.value, end=" ")
print()
Code language: Python (python)
Note that we used the max_row and set it to 6 to print the six first rows from the Excel file.
6. Bonus: Determining the Number of Rows and Columns in the Excel File
In the sixth, and bonus step, we are going to find out how many rows and columns we have in the example Excel file we have imported with Python:
print(sheet.max_row, sheet.max_column)
Code language: Python (python)
Reading an Excel (xlsx) File to a Python Dictionary
Now, before we learn how to read multiple xlsx files, we will import data from Excel into a Python dictionary. It’s quite simple, but for the example below, we need to know the column names before we start. If we want to find out the column names, we can run the following code (or open the Excel file):
import openpyxl
from pathlib import Path
xlsx_file = Path('SimData', 'play_data.xlsx')
wb_obj = openpyxl.load_workbook(xlsx_file)
sheet = wb_obj.active
col_names = []
for column in sheet.iter_cols(1, sheet.max_column):
col_names.append(column[0].value)
print(col_names)
Code language: Python (python)
Creating a Dictionary from an Excel File
In this section, we will finally read the Excel file using Python and create a dictionary.
data = {}
for i, row in enumerate(sheet.iter_rows(values_only=True)):
if i == 0:
data[row[1]] = []
data[row[2]] = []
data[row[3]] = []
data[row[4]] = []
data[row[5]] = []
data[row[6]] = []
else:
data['Subject ID'].append(row[1])
data['First Name'].append(row[2])
data['Day'].append(row[3])
data['Age'].append(row[4])
data['RT'].append(row[5])
data['Gender'].append(row[6])
Code language: Python (python)
Now, let’s walk through the code example above. First, we create a Python dictionary (data). Second, we loop through each row (using iter_rows), and we only go through the rows where there are values. Second, we have an if statement where we check if it’s the first row and add the keys to the dictionary. That is, we set the column names as keys. Third, we append the data to each key (column name) in the other statement.
How to Read Multiple Excel (xlsx) Files in Python
In this section, we will learn how to read multiple xlsx files in Python using openpyxl. Additionally to openpyxl and Path, we are also going to work with the glob module.
1. Import the Modules
In the first step, we are going to import the modules Path, glob, and openpyxl:
import glob
import openpyxl
from pathlib import Path
Code language: Python (python)
The glob module and the os module can be used to check if your file is empty with Python.
2. Read all xlsx Files in the Directory to a List
Second, we will read all the .xlsx files in a subdirectory into a list. Now, we use the glob module together with Path:
xlsx_files = [path for path in Path('XLSX_FILES').rglob('*.xlsx')]
Code language: Python (python)
3. Create Workbook Objects (i.e., read the xlsx files)
Third, we can now read all the xlsx files using Python. Again, we will use the load_workbook method. However, this time we will loop through each file we found in the subdirectory,
wbs = [openpyxl.load_workbook(wb) for wb in xlsx_files]
Code language: Python (python)
Now, in the code examples above, we are using Python list comprehension (twice, in both step 2 and 3). First, we create a list of all the xlsx files in the “XLSX_FILES” directory. Second, we loop through this list and create a list of workbooks. Of course, we could add this to the first line of code above.
4. Work with the Imported Excel Files
We can now work with the imported excel files in the fourth step. For example, we can get the first file by adding “[0]” to the list. If we want to know the sheet names of this file we do like this:wbs[0].sheetnames
. Many of the things we can do, and have done in the previous example on reading xlsx files in Python, can be done when we’ve read multiple Excel files.
Notice this is one great example of how to use this programming language. Other examples are, for instance, to use it for renaming files in Python.
Conclusion: Reading Excel (xlsx) Files in Python
In conclusion, Openpyxl is a powerful Python library that enables developers to read, write, and manipulate Excel files using Python. This tutorial covered how to read an Excel (xlsx) file in Python using Openpyxl. We started by importing the necessary modules, setting the path to the Excel file, and then reading the file and its active sheet. We then explored how to work with the sheet and even determine the number of rows and columns in the file.
Moreover, we also learned how to read an Excel file to a Python dictionary and create a dictionary from an Excel file. Finally, we learned how to read multiple Excel files in a directory by importing the required modules, reading all the xlsx files in the directory to a list, creating workbook objects, and working with the imported Excel files.
By mastering these techniques, you can easily read and manipulate Excel files using Python, which can be incredibly useful in various data processing applications. So go ahead, try it out, and unlock the full potential of Openpyxl in your Python projects!
It is, of course, possible to import data from various other file formats. For instance, read the post about parsing json files in Python to learn more about reading JSON files.
This section ‘Reading an Excel (xlsx) FIle to a Python Dictionary’ has a typo in its name, should be ‘File…’
Great article!
Hey Amir,
Thanks for spotting this and thank you for your comment,
Best,
Erik
your examples use xlsx_file = Path(‘SimData’, ‘play_data.xlsx’)
Since I do not have a copy of that spreadsheet, I find it difficult to adapt your examples to the use of my own spreadsheet.
May I have a copy of ‘play_data.xlsx’
Hey Robert,
Thank you, for your comment. Typically I do link to the example data used in my tutorials. I’ll add a link to the Excel file later today. If you need practicing reading xlsx files ASAP here’s the link: Example .xlsx file. Have a nice day,
Erik
in 4. Read the Active Sheet from the Excel file, there is a typo in:
wsheet = wb_obj.active
ok-> sheet = wb_obj.active
Hey Daw! Thank you for finding this error! I have now corrected it. Take care!
Great content, keep it up
Thanks Alan,
Best,
Erik
Great content Nice
This is a great guide! I’ve been struggling to get my head around reading xlsx files in Python, but this guide has made it much easier.
Thanks “Tv schedule”.