Export PySpark DataFrame as CSV (3 Examples)

This post explains how to export a PySpark DataFrame as a CSV in the Python programming language.

The tutorial consists of these contents:

Introduction
Creating Example Data
Example 1: Using write.csv() Function
Example 2: Using write.format() Function
Example 3: Using write.option() Function
Video, Further Resources & Summary

Let’s dive into it:

Introduction

PySpark is an open-source software that is used to store and process data by using the Python Programming language.

We can construct a PySpark object by using a Spark session and specify the app name by using the getorcreate() method.

SparkSession.builder.appName(app_name).getOrCreate()

After data creation with a list of dictionaries, we have to pass the data to the createDataFrame() method. This will create our PySpark DataFrame.

spark.createDataFrame(data)

To display our DataFrame we can use the show() method:

dataframe.show()

Creating Example Data

In this case, we are going to create a DataFrame from a list of dictionaries with eight rows and three columns, containing details about fruits and cities. We can display the DataFrame by using the show() method:

# import the pyspark module
import pyspark
 
# import the sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and then give the app name
spark = SparkSession.builder.appName('statistics_globe').getOrCreate()
 
#create a dictionary with 3 pairs with 8 values each
#inside a list
data = [{'fruit': 'apple', 'cost': '67.89', 'city': 'patna'},
        {'fruit': 'mango', 'cost': '87.67', 'city': 'delhi'},
        {'fruit': 'apple', 'cost': '64.76', 'city': 'harayana'},
        {'fruit': 'banana', 'cost': '87.00', 'city': 'hyderabad'},
        {'fruit': 'guava', 'cost': '69.56', 'city': 'delhi'},
        {'fruit': 'mango', 'cost': '234.67', 'city': 'patna'},
        {'fruit': 'apple', 'cost': '143.00', 'city': 'delhi'},
        {'fruit': 'mango', 'cost': '49.0', 'city': 'banglore'}]
 
 
# creating a dataframe  from the given list of dictionary
dataframe = spark.createDataFrame(data)
 
# display the final dataframe
dataframe.show()

export pyspark dataframe as csv python Table 1

The table above shows our example DataFrame. As you can see, it contains three columns that are called fruit, cost, and city.

Now let’s export the data from our DataFrame into a CSV.

Example 1: Using write.csv() Function

This example is using the write.csv() method to export the data from the given PySpark DataFrame.

dataframe.write.csv("file_name")

In the next step, we are exporting the above DataFrame into a CSV.

#export the dataframe with file name as final_data
dataframe.write.csv("final_data")

export pyspark dataframe as csv python Table 2

Example 2: Using write.format() Function

This example is using the save() method to export the data in the csv format.

dataframe.write.format("csv").save("file_name")

In this example, we are exporting the above DataFrame into CSV format.

dataframe.write.format("csv").save("final_data")

export pyspark dataframe as csv python Table 2

Example 3: Using write.option() Function

This example uses the option() method to display header values (column names) and exporting the data.

dataframe.write.option("header",True).csv("column_name")

In this example we are exporting PySpark DataFrame into csv

#export the dataframe with file name as final_data
 
dataframe.write.option("header",True).csv("final_data")

export pyspark dataframe as csv python Table 2

Video, Further Resources & Summary

If you need more explanations on how to Write Files in PySpark, you may have a look at the following YouTube video of the YouTube channel Let’s Data!

By loading the video, you agree to YouTube’s privacy policy.
Learn more

Load video

Always unblock YouTube

You may have a look on the Data Hacks website for some other tutorials:

Summary: This post has illustrated how to send out a PySpark DataFrame as a CSV in the Python programming language. In case you have any additional questions, you may leave a comment below.

This article was written in collaboration with Gottumukkala Sravan Kumar. You may find more information about Gottumukkala Sravan Kumar and his other articles on his profile page.

Export PySpark DataFrame as CSV (3 Examples)

Introduction

Creating Example Data

Example 1: Using write.csv() Function

Example 2: Using write.format() Function

Example 3: Using write.option() Function

Video, Further Resources & Summary

Leave a Reply Cancel reply

Test for Alphabetical Letters in Character String Using Python (Example Code)

Concatenate Two pandas DataFrames with Different Columns in Python (Example Code)

How to Use the pandas Library in Python Programming (3 Examples)