How to Convert a Pandas DataFrame to a NumPy Array

In this short Python Pandas tutorial, we will learn how to convert a Pandas dataframe to a NumPy array.  Specifically, we will learn how easy it is to transform a dataframe into an array using the two methods values and to_numpy, respectively. Furthermore, we will also learn how to import data from an Excel file and change this data to an array.

If we want to carry out some high-level mathematical functions using the NumPy package, we may need to change the dataframe to a 2-d NumPy array.

Table of Contents

The structure of this post unfolds as a step-by-step guide to easily convert a Pandas dataframe to a NumPy array. The different sections involve distinct techniques and scenarios for achieving this conversion.

We begin by establishing the prerequisites to ensure you have the required knowledge. Following that, we will learn different methodologies to perform the conversion:

In the first section, we explore the to_numpy() function as an efficient approach to achieve the transformation. Next, we address cases where specific column types need conversion, offering insights into managing columns with floating-point values. Managing different data types is followed by a section demonstrating the process of reading data from an Excel file and then converting it to a NumPy array. Next, we discuss extracting summary statistics from the converted NumPy array, adding a practical dimension to the process.

Throughout the post, we go through clear explanations and examples, making converting dataframes to NumPy arrays comprehensible even for those new to the concept. Whether you are looking for a general approach or specific scenarios, these step-by-step sections will equip you with the skills to effortlessly convert dataframes to NumPy arrays.

Prerequisites

Now, if we want to convert a Pandas dataframe to a NumPy array, we need to have Python, Pandas, and NumPy installed. Check the post about how to install Python packages to learn more about the installation of packages. It is recommended, however, that we install Python packages in a virtual environment. Finally, if we install and download a Python distribution, we will get everything we need. Nice and easy!

How do you convert a DataFrame to an array in Python?

To convert a Pandas DataFrame to a NumPy array(), we can use the values method (DataFrame.to_numpy()). For instance, if we want to convert our dataframe called df we can add this code: np_array = df.to_numpy().

2 ways to convert pandas dataframe to numpy array
  • Save
2 methods to convert dataframe to numpy array

How to Convert a Pandas Dataframe to a Numpy Array in 3 Steps:

In this section, we are going through three easy steps to convert a dataframe into a NumPy array. In the first step, we import Pandas and NumPy. Step 2 involves creating the dataframe from a dictionary. Of course, this step could instead involve importing the data from a file (e.g., CSV, Excel). In the final step, we will use the values method to get the dataframe as an array.

Step #1: Import the Python Libraries

In the first example of converting a dataframe to an array, we will create a dataframe from a Python dictionary. The first step, however, is to import the Python libraries we need:

import pandas as pd
import numpy as npCode language: Python (python)

We followed the convention and imported pandas as pd and NumPy as np. In the next step, we will get the data. This step, of course, is optional if you already have your data in Pandas dataframe. If this is the case, you can skip to the third step and go ahead and convert the dataframe to NumPy array.

Step #2: Get your Data into a Pandas Dataframe

In the second step, we will create the Python dictionary and convert it to a Pandas dataframe:

data = {'Rank':[1, 2, 3, 4, 5, 6],
       'Language': ['Python', 'Java',
                   'Javascript',
                   'C#', 'PHP',
                   'C/C++'],
       'Share':[29.88, 19.05, 8.17,
               7.3, 6.15, 5.92],
       'Trend':[4.1, -1.8, 0.1, -0.1, -1.0, -0.2]}

df = pd.DataFrame(data)

display(df)Code language: Python (python)
  • Save

As you may understand, this step is optional, and you can of course, import data from a .csv, SPSS, STATA, Excel, or Stata file, to name a few, instead. Furthermore, check the post about how to convert a dictionary to a Pandas dataframe for more information on creating dataframes from dictionaries. In the next step, we are ready to change the dataframe to an array.

Step #3 Convert the Dataframe to an Array:

Finally, in the third step, we are ready to use the values method. Here’s how to convert the Pandas dataframe to a NumPy array:

# convert dataframe to numpy array
df.valuesCode language: Python (python)
convert dataframe to numpy array
  • Save

That was easy, using the values method we converted the Pandas dataframe to a NumPy array in one line of code. In the next example, we are going to work with another method. That is, we are going to use the recommended to_numpy() method.

How to Change a Dataframe to a Numpy Array Example 2:

In the second example, we will convert a Pandas dataframe to a NumPy Array using the to_numpy() method. The to_numpy() method is as simple as the values method. However, converting the dataframe to an array can also take parameters.

  • Save

Convert Pandas to a NumPy Array with to_numpy()

Now, here is a simple conversion example, generating the same NumPy array as in the previous example;

# Pandas dataframe to numpy array:
df.to_numpy()Code language: Python (python)

Convert a Pandas Column Column with Floats to NumPy Array

If we want to convert just one column, we can use the dtype parameter. For instance, here we will convert one column of the dataframe (i.e., Share) to a NumPy array of NumPy Float data type;

# pandas to numpy only floating-point numbers:
df['Share'].to_numpy(np.float64)Code language: Python (python)
using to_numpy to convert a dataframe to a numpy array
  • Save

Note, if we wanted to convert only the columns containing integers, we can use no.int64. For strings, we could input objects. A final note, before going to the third example, is that is recommended to convert Pandas dataframe to an array using the to_numpy() method. In the next example, we are going only to select float and then convert the columns containing float values to a NumPy array.

Convert only Pandas Float Columns in a Dataframe to a NumPy Array Example 3:

Now, if we only want the numeric values from the dataframe to be converted to NumPy array it is possible. Here, we need to use the select_dtypes method.

# Pandas dataframe to NumPy array selecting specific data types:
df.select_dtypes(include=float).to_numpy()Code language: Python (python)
convert pandas dataframe to numpy array
  • Save

Note when selecting the columns with float values, we used the parameter float. If we, on the other hand, want to select the columns with integers, we could use int. Using this argument comes in handy when we want to e.g., calculate descriptive statistics or extract certain data types from the NumPy array.

Read an Excel File to a Dataframe and Convert it to a NumPy Array Example 4:

Now, of course, we often have the data stored in a file. For instance, we may want to read the data from an Excel file using Pandas and then transform it into a NumPy 2-d array. Here is a quick example using Pandas to read an Excel file:

# Reading the excel file
df = pd.read_excel('http://open.nasa.gov/datasets/NASA_Labs_Facilities.xlsx',
                  skiprows=1)

# Exploring the first 5 rows and columns:
df.iloc[0:5, 0:5]Code language: Python (python)

Now, in the code above, we read an Excel (.xlsx) file from a URL. Here, the skiprows parameter was used to skip the first empty row. Moreover, we used Pandas iloc to slice columns and rows, from this df and print it. Here’s the result:

  • Save

In the last example, we will, again, use df.to_numpy() to convert the dataframe to a NumPy array:

# Converting the dataframe to an array:
np_array = df.to_numpy()Code language: Python (python)
convert dataframe to numpy array
  • Save

Converting a Pandas dataframe to a NumPy array: Summary Statistics

In this last section, we will convert a dataframe to a NumPy array and use some of the methods of the array object. Again, we start by creating a dictionary. Second, we use the DataFrame class to create a dataframe from the dictionary. Finally, we convert the dataframe to a NumPy array, only selecting float numbers.

# Creating a dict
data = {'Rank':[1, 2, 3, 4, 5, 6],
       'Language': ['Python', 'Java',
                   'Javascript',
                   'C#', 'PHP',
                   'C/C++'],
       'Share':[29.88, 19.05, 8.17,
               7.3, 6.15, 5.92],
       'Trend':[4.1, -1.8, 0.1, -0.1, -1.0, -0.2]}

# Creating a dataframe from dict
df = pd.DataFrame(data)

# Pandas to NumPy
np_array = df.select_dtypes(include=float).to_numpy()Code language: Python (python)

Now that we have our NumPy array, we can start using some methods for calculating summary statistics. First, we will summarize the two dimensions using the sum() method. Here is an example code snippet:

# Summarizing the array
np_array.sum(axis=0)Code language: Python (python)

Second, we can calculate the mean values of the two dimensions using the mean():

# Calculating the mean of the array:
np_array.mean(axis=0)Code language: Python (python)

We used the parameter axis and set it to “0”. Now, if we didn’t use this parameter and set it to “0” we would have calculated it along each row, speaking, of the array. This might be useful if we wanted to calculate the mean of scores across each observation in the dataset, for example. For example, if we have data from a questionnaire to measure different constructs, we may want to create a summary score for the complete scale (as well as for the constructs). In this case, we would remove the axis parameter.

DataFrame to Array YouTube Tutorial

Here is also a YouTube Video explaining how to convert a Pandas dataframe to a NumPy array:

Conclusion

In this Pandas dataframe tutorial, we have learned how to convert Pandas dataframes to NumPy arrays. It was an easy task, and we learned how to do this using values and to_numpy. As a final note, and as previously mentioned, you should use the latter method for converting the dataframe.

Resources

  • Save

6 thoughts on “How to Convert a Pandas DataFrame to a NumPy Array”

    1. Hey Robin!

      This is strange. Of course, I allow people to copy the code. Many times I even link out to a Jupyter notebook containing all code.

      I just tried myself and I was able to copy the code using three different browsers (Edge, Firefox, and Brave). Not sure why you’re not able to copy the code on how to convert a Pandas dataframe to a NumPy array. What browser are you using? I hope we can figure this out.

      Best,

      Erik

  1. Farzaneh Shadanpour

    Hi, Mr Erik,

    That is very very useful. Points are very applicable and rarely searchable on the web!
    You helped me a lot! Thank you.

    Best regards,

    Farzaneh

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top
Share via
Copy link