In this Python tutorial, we will learn how to read and write JSON files using Python. In the first part, we are going to use the Python package json to create and read a JSON file as well as write a JSON file. After that, we are going to use Pandas read_json
method to load JSON files into a Pandas dataframe. Here, we will learn how to read from a JSON file locally, from a URL, and how to read a nested JSON file.
Finally, as a bonus, we will also learn how to manipulate data in Pandas dataframes, rename columns, and plot the data using Seaborn.
Table of Contents
- What is a JSON File?
- Parsing JSON in Python
- Saving to a JSON file
- How to Use Pandas to Load a JSON File
- Time Series Plot from JSON Data using Seaborn
- Conclusion
- Tutorials
What is a JSON File?
JSON, short for JavaScript Object Notation, is a compact, text based format used to exchange data. This format that is common for downloading, and storing, information from web servers via so-called Web APIs. JSON is a text-based format and when opening up a JSON file, we will recognize the structure. That is, it is not so different from Python’s structure for a dictionary.
In the first Python parsing json example, we will use the Python module json
to create a JSON file. After we have done that, we are going to load the JSON file. First. we create a dictionary with data to practice with:
import json
# Creating a Python Dictionary
data = {"Sub_ID":["1","2","3","4","5","6","7","8" ],
"Name":["Erik", "Daniel", "Michael", "Sven",
"Gary", "Carol","Lisa", "Elisabeth" ],
"Salary":["723.3", "515.2", "621", "731",
"844.15","558", "642.8", "732.5" ],
"StartDate":[ "1/1/2011", "7/23/2013", "12/15/2011",
"6/11/2013", "3/27/2011","5/21/2012",
"7/30/2013", "6/17/2014"],
"Department":[ "IT", "Manegement", "IT", "HR",
"Finance", "IT", "Manegement", "IT"],
"Sex":[ "M", "M", "M",
"M", "M", "F", "F", "F"]}
print(data)
Code language: Python (python)
Parsing JSON in Python
In the next Python read a JSON file example, we will read the JSON file we created above. Reading a JSON file in Python is pretty easy. We open the file using open()
. After this, we read the JSON file using the load()
method.
# Python Parsing JSON
with open('data.json') as json_file:
data = json.load(json_file)
print(data)
Code language: Python (python)
Now, using Python to read a json file was pretty easy, right? Now, before learning how to save a JSON file in Python, it is worth mentioning that we can also parse Excel files. For example, see the post about reading xlsx files in Python for more information-
Saving to a JSON file
In this section, we will learn how to write a JSON file in Python. In Python, this can be done using the module json . This module enables us both to read and write content to and from a JSON file. This module converts the JSON format to Python’s internal format for Data Structures. So we can work with JSON structures just as we do in the usual way with Python’s own data structures.
Python Write JSON File Example:
In the example code below, we will learn how to make a JSON file in Python. First, we start by importing the json
module. After we’ve done that, we open up a new file and use the dump
method to write a json file using Python.
import json
# Parse JSON
with open('data.json', 'w') as outfile:
json.dump(data, outfile)
Code language: Python (python)
It is, of course, possible to save JSON to other formats, such as xlsx, and CSV, and we will learn how to export a Pandas dataframe to CSV, later in this blog post.
How to Use Pandas to Load a JSON File
Now, if we are going to work with the data we might want to use Pandas to load the JSON file into a Pandas dataframe. This will enable us to manipulate data, do summary statistics, and data visualization using Pandas built-in methods. Note, we will cover this briefly later in this post also.
Pandas Read Json Example:
In the next example, we will use Pandas read_json
method to read the JSON file we wrote earlier (i.e., data.json
). Simply, we start by importing pandas as pd:
import pandas as pd
# Read JSON as a dataframe with Pandas:
df = pd.read_json('data.json')
df
Code language: Python (python)
The output, when working with Jupyter Notebooks, will look like this:
It’s also possible to convert a dictionary to a Pandas dataframe. Make sure to check that post out for more information.
Data Manipulation using Pandas
Now that we have loaded the JSON file into a Pandas dataframe we are going use Pandas inplace method to modify our dataframe. We start by setting the Sub_ID
column as index.
df.set_index('Sub_ID', inplace=True)
df
Code language: Python (python)
Pandas JSON to CSV Example
Now when we have loaded a JSON file into a dataframe we may want to save it in another format. For instance, we may want to save it as a CSV file and we can do that using Pandas to_csv method. It may be useful to store it in a CSV, if we prefer to browse through the data in a text editor or Excel.
We carry out the same data manipulation method in the Pandas JSON to CSV example below.
# Saving .json to .csv in Python
df.to_csv("data.csv")
Code language: Python (python)
In a more recent post, you will learn how to convert JSON to Excel (.xlsx) in Python with Pandas, among other Python packages.
Pandas to JSON Example
In this section, we will learn how to save Pandas dataframe to JSON. If we, for instance, have our data stored in a CSV file locally, but want to enable the functionality of the JSON files, we will use Pandas to_json
method:
df = pd.read_csv("data.csv")
# Save dataframe to JSON format
df.to_json("data.json")
Code language: Python (python)
Learn more about working with CSV files using Pandas in the Pandas Read CSV Tutorial
How to Load JSON from an URL
We have now seen how easy it is to create a JSON file, write it to our hard drive using Python Pandas, and, finally, how to read it using Pandas. However, as previously mentioned, many times, the data stored in the JSON format are on the web.
Thus, in this section of the Python json guide, we will learn how to use Pandas read_json method to read a JSON file from a URL. Most often, it is fairly simple: we create a string variable pointing to the URL:
url = "https://api.exchangerate-api.com/v4/latest/USD"
# Parsing JSON from URL:
df = pd.read_json(url)
df.head()
Code language: Python (python)
Load JSON from a URL: Second Example
When loading some data, using Pandas read_json seems to create a dataframe with dictionaries within each cell. One way to deal with these dictionaries, nested within dictionaries, is to work with the Python module request. This module also has a method for parsing JSON files. After we have parsed the JSON file we will use the method json_normalize
to convert the JSON file to a dataframe.
import requests
from pandas.io.json import json_normalize
url = "https://think.cs.vt.edu/corgis/json/airlines/airlines.json"
resp = requests.get(url=url)
# Handling nested JSON:
df = json_normalize(resp.json())
df.head()
Code language: Python (python)
As seen in the image above, the column names are pretty long. This is quite impractical when we are going to create a time series plot later using Seaborn. We are now going to rename the columns so they become a bit easier to use.
In the code example below, we use Pandas rename
method together with the Python module re
. We are using a regular expression to remove “statistics.# of” and “statistics.” from the column names. Finally, we are also replacing dots (“.”) with underscores (“_”) using the str.replace
method:
import re
df.rename(columns=lambda x: re.sub("statistics.# of","",x),
inplace=True)
df.rename(columns=lambda x: re.sub("statistics.","",x),
inplace=True)
df.columns = df.columns.str.replace("[.]", "_")
df.head()
Code language: Python (python)
Renaming columns is something that is quicker, and easier, done using the Python package Pyjanitor (thanks Shantanu Oak, for pointing this out in the comment section). See how to use Pandas and Pyjanitor to rename columns, among other things, using Pyjanitor:
Time Series Plot from JSON Data using Seaborn
In the last example in this post, we will use Seaborn to create a time series plot. The data we loaded from JSON to a dataframe contains data about delayed and canceled flights. We are going to use Seaborns lineplot
method to create a time series plot of the number of canceled flights from 2003 to 2016, grouped by carrier code.
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
fig = plt.figure(figsize=(10, 7))
g = sns.lineplot(x="timeyear", y="flightscancelled", ci=False,
hue="carriercode", data=df)
g.set_ylabel("Flights Cancelled",fontsize=20)
g.set_xlabel("Year",fontsize=20)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
Code language: Python (python)
Note, that we changed the font size as well as the x- and y-axis’ labels using the methods set_ylabel
and set_xlabel
. Furthermore, we also moved the legend using the legend
method from matplotlib.
For more about exploratory data analysis using Python:
- In the post 9 Data Visualization Techniques That You Should Learn in Python, you will learn 9 different data visualization methods and how to carry them out using Python and Seaborn.
- Exploratory Data Analysis using Python, Pandas, and Seaborn
Conclusion
In this post, we have learned how to write a JSON file from a Python dictionary, how to load that JSON file using Python and Pandas. Furthermore, we have also learned how to use Pandas to load a JSON file from a URL to a dataframe, how to read a nested JSON file to a dataframe.
Here is a link to a Jupyter Notebook containing all code examples in this post.
Tutorials
- Python Check if File is Empty: Data Integrity with OS Module
- Coefficient of Variation in Python with Pandas & NumPy
- Your Guide to Reading Excel (xlsx) Files in Python
- Rename Files in Python: A Guide with Examples using os.rename()
Very informative article. I will like to mention that renaming columns is made a lot easier using “clean_names” method provided by pyjanitor.
https://github.com/ericmjl/pyjanitor
Hey Shantanu,
Thanks for your comment and your suggestion. I will look into the clean_names method provided by pyjanitor.
Best,
Erik
Thanks again Shantanu for letting me know about Pyjanitor. I ended up writing a blog post using it,
Best,
Erik
Thank you Erik, you have broken it down really well. Just started learning programming and this has been the best article explaining the nested dictionaries to a dataframe.
Hi Natasha!
Thank you for your kind comments. I am really glad the article helped you in your learning progress. Have a nice day,
Erik