Display PySpark DataFrame in Table Format (5 Examples)

In this article, I’ll illustrate how to show a PySpark DataFrame in the table format in the Python programming language.

The tutorial consists of these topics:

  • Introduction
  • Creating Example Data
  • Example 1: Using show() Method with No Parameters
  • Example 2: Using show() Method with Vertical Parameter
  • Example 3: Using show() Method with Truncate Parameter
  • Example 4: Using show() Method with n Value
  • Example 5: Using toPandas() Method
  • Video, Further Resources & Summary

Let’s dig in:

Introduction

PySpark is an open-source software that is used to store and process data by using the Python Programming language.

We can generate a PySpark object by using a Spark session and specify the app name by using the getorcreate() method.

SparkSession.builder.appName(app_name).getOrCreate()

After the data with a list of dictionaries is created, we have to pass the data to the createDataFrame() method. This will generate our PySpark DataFrame.

spark.createDataFrame(data)

After that, we can display the DataFrame by using the show() method:

dataframe.show()

Creating Example Data

In our case, we are going to create a DataFrame from a list of dictionaries with eight rows and three columns, which contains details from fruits and cities.

To display the DataFrame we can use the show() method:

# import the pyspark module
import pyspark
 
# import the sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and then give the app name
spark = SparkSession.builder.appName('statistics_globe').getOrCreate()
 
#create a dictionary with 3 pairs with 8 values each
#inside a list
data = [{'fruit': 'apple', 'cost': '67.89', 'city': 'patna'},
        {'fruit': 'mango', 'cost': '87.67', 'city': 'delhi'},
        {'fruit': 'apple', 'cost': '64.76', 'city': 'harayana'},
        {'fruit': 'banana', 'cost': '87.00', 'city': 'hyderabad'},
        {'fruit': 'guava', 'cost': '69.56', 'city': 'delhi'},
        {'fruit': 'mango', 'cost': '234.67', 'city': 'patna'},
        {'fruit': 'apple', 'cost': '143.00', 'city': 'delhi'},
        {'fruit': 'mango', 'cost': '49.0', 'city': 'banglore'}]
 
 
# creating a dataframe  from the given list of dictionary
dataframe = spark.createDataFrame(data)
 
# display the final dataframe
dataframe.show()

pyspark dataframe table format python table 1

The table above shows our example DataFrame. As you can see, it is containing three columns that are called fruit, cost, and city.

Now let’s display the PySpark DataFrame in a tabular format.

Example 1: Using show() Method with No Parameters

This example is using the show() method to display the entire PySpark DataFrame in a tabular format.

dataframe.show()

In this example, we are displaying the PySpark DataFrame in a table format.

 
#display entire dataframe in tabular format using show() method
dataframe.show()

pyspark dataframe table format python table 1

Example 2: Using show() Method with Vertical Parameter

This example is applying the show() method with vertical parameter set to True to display the PySpark DataFrame in Vertical table format.

dataframe.show(vertical=True)

In this example, we are displaying the PySpark DataFrame in vertical table format.

 
#display entire dataframe in tabular format using show() method
#in vertical
dataframe.show(vertical=True)

pyspark dataframe table format python table 2

Example 3: Using show() Method with Truncate Parameter

This example uses the show() method with truncate parameter set to an integer to show the PySpark DataFrame in table format by removing some characters in each cell.

dataframe.show(truncate=n)

In this example, we are displaying the PySpark DataFrame in tabular format using the show() method by getting only 2 characters in each cell
using truncate.

#display entire dataframe in tabular format using show() method
#by getting only 2 charcaters in each cell
#using truncate
dataframe.show(truncate=2)

pyspark dataframe table format python table 3

Example 4: Using show() Method with n Value

This example makes use of the show() method with n value as parameter set to an integer to display the PySpark DataFrame in table format by displaying top n rows from the PySpark DataFrame.

dataframe.show(n)

In this example, we are displaying the PySpark DataFrame in tabular format using the show() method by getting top 4 rows

#display entire dataframe in tabular format using show() method
#by getting only top 4 rows
dataframe.show(4)

pyspark dataframe table format python table 4

Example 5: Using toPandas() Method

This example uses the toPandas() method to display the PySpark DataFrame in table format by converting it into pandas table.

dataframe.toPandas()

In this example, we are displaying the PySpark DataFrame in tabular format using toPandas() method.

#display entire dataframe in tabular format using toPandas() method
dataframe.toPandas()

pyspark dataframe table format python table 5

Video, Further Resources & Summary

If you need more information on how to display a PySpark DataFrame in the table format, then you may have a look at the following YouTube video of the YouTube channel from Krish Naik.

In his video, the author is explaining how to show a PySpark DataFrame in table format.

YouTube

By loading the video, you agree to YouTube’s privacy policy.
Learn more

Load video

Furthermore, you may have a look at our other tutorials we have on the Data Hacks website:

Summary: This post has shown you how to display a PySpark DataFrame in the table format in the Python programming language. You can leave a comment below if you have any additional questions.

This article was written in collaboration with Gottumukkala Sravan Kumar. You may find more information about Gottumukkala Sravan Kumar and his other articles on his profile page.

Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

Menu
Top