Display PySpark DataFrame in Table Format (5 Examples)
In this article, I’ll illustrate how to show a PySpark DataFrame in the table format in the Python programming language.
The tutorial consists of these topics:
- Introduction
- Creating Example Data
- Example 1: Using show() Method with No Parameters
- Example 2: Using show() Method with Vertical Parameter
- Example 3: Using show() Method with Truncate Parameter
- Example 4: Using show() Method with n Value
- Example 5: Using toPandas() Method
- Video, Further Resources & Summary
Let’s dig in:
Introduction
PySpark is an open-source software that is used to store and process data by using the Python Programming language.
We can generate a PySpark object by using a Spark session and specify the app name by using the getorcreate() method.
SparkSession.builder.appName(app_name).getOrCreate() |
SparkSession.builder.appName(app_name).getOrCreate()
After the data with a list of dictionaries is created, we have to pass the data to the createDataFrame() method. This will generate our PySpark DataFrame.
spark.createDataFrame(data) |
spark.createDataFrame(data)
After that, we can display the DataFrame by using the show() method:
dataframe.show() |
dataframe.show()
Creating Example Data
In our case, we are going to create a DataFrame from a list of dictionaries with eight rows and three columns, which contains details from fruits and cities.
To display the DataFrame we can use the show() method:
# import the pyspark module import pyspark # import the sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and then give the app name spark = SparkSession.builder.appName('statistics_globe').getOrCreate() #create a dictionary with 3 pairs with 8 values each #inside a list data = [{'fruit': 'apple', 'cost': '67.89', 'city': 'patna'}, {'fruit': 'mango', 'cost': '87.67', 'city': 'delhi'}, {'fruit': 'apple', 'cost': '64.76', 'city': 'harayana'}, {'fruit': 'banana', 'cost': '87.00', 'city': 'hyderabad'}, {'fruit': 'guava', 'cost': '69.56', 'city': 'delhi'}, {'fruit': 'mango', 'cost': '234.67', 'city': 'patna'}, {'fruit': 'apple', 'cost': '143.00', 'city': 'delhi'}, {'fruit': 'mango', 'cost': '49.0', 'city': 'banglore'}] # creating a dataframe from the given list of dictionary dataframe = spark.createDataFrame(data) # display the final dataframe dataframe.show() |
# import the pyspark module import pyspark # import the sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and then give the app name spark = SparkSession.builder.appName('statistics_globe').getOrCreate() #create a dictionary with 3 pairs with 8 values each #inside a list data = [{'fruit': 'apple', 'cost': '67.89', 'city': 'patna'}, {'fruit': 'mango', 'cost': '87.67', 'city': 'delhi'}, {'fruit': 'apple', 'cost': '64.76', 'city': 'harayana'}, {'fruit': 'banana', 'cost': '87.00', 'city': 'hyderabad'}, {'fruit': 'guava', 'cost': '69.56', 'city': 'delhi'}, {'fruit': 'mango', 'cost': '234.67', 'city': 'patna'}, {'fruit': 'apple', 'cost': '143.00', 'city': 'delhi'}, {'fruit': 'mango', 'cost': '49.0', 'city': 'banglore'}] # creating a dataframe from the given list of dictionary dataframe = spark.createDataFrame(data) # display the final dataframe dataframe.show()
The table above shows our example DataFrame. As you can see, it is containing three columns that are called fruit, cost, and city.
Now let’s display the PySpark DataFrame in a tabular format.
Example 1: Using show() Method with No Parameters
This example is using the show() method to display the entire PySpark DataFrame in a tabular format.
dataframe.show() |
dataframe.show()
In this example, we are displaying the PySpark DataFrame in a table format.
#display entire dataframe in tabular format using show() method dataframe.show() |
#display entire dataframe in tabular format using show() method dataframe.show()
Example 2: Using show() Method with Vertical Parameter
This example is applying the show() method with vertical parameter set to True to display the PySpark DataFrame in Vertical table format.
dataframe.show(vertical=True) |
dataframe.show(vertical=True)
In this example, we are displaying the PySpark DataFrame in vertical table format.
#display entire dataframe in tabular format using show() method #in vertical dataframe.show(vertical=True) |
#display entire dataframe in tabular format using show() method #in vertical dataframe.show(vertical=True)
Example 3: Using show() Method with Truncate Parameter
This example uses the show() method with truncate parameter set to an integer to show the PySpark DataFrame in table format by removing some characters in each cell.
dataframe.show(truncate=n) |
dataframe.show(truncate=n)
In this example, we are displaying the PySpark DataFrame in tabular format using the show() method by getting only 2 characters in each cell
using truncate.
#display entire dataframe in tabular format using show() method #by getting only 2 charcaters in each cell #using truncate dataframe.show(truncate=2) |
#display entire dataframe in tabular format using show() method #by getting only 2 charcaters in each cell #using truncate dataframe.show(truncate=2)
Example 4: Using show() Method with n Value
This example makes use of the show() method with n value as parameter set to an integer to display the PySpark DataFrame in table format by displaying top n rows from the PySpark DataFrame.
dataframe.show(n) |
dataframe.show(n)
In this example, we are displaying the PySpark DataFrame in tabular format using the show() method by getting top 4 rows
#display entire dataframe in tabular format using show() method #by getting only top 4 rows dataframe.show(4) |
#display entire dataframe in tabular format using show() method #by getting only top 4 rows dataframe.show(4)
Example 5: Using toPandas() Method
This example uses the toPandas() method to display the PySpark DataFrame in table format by converting it into pandas table.
dataframe.toPandas() |
dataframe.toPandas()
In this example, we are displaying the PySpark DataFrame in tabular format using toPandas() method.
#display entire dataframe in tabular format using toPandas() method dataframe.toPandas() |
#display entire dataframe in tabular format using toPandas() method dataframe.toPandas()
Video, Further Resources & Summary
If you need more information on how to display a PySpark DataFrame in the table format, then you may have a look at the following YouTube video of the YouTube channel from Krish Naik.
In his video, the author is explaining how to show a PySpark DataFrame in table format.
Furthermore, you may have a look at our other tutorials we have on the Data Hacks website:
- Add New Column to PySpark DataFrame in Python
- Change Column Names of PySpark DataFrame in Python
- Concatenate Two & Multiple PySpark DataFrames
- Convert PySpark DataFrame Column from String to Double Type
- Convert PySpark DataFrame Column from String to Int Type
- Export PySpark DataFrame as CSV
- Filter PySpark DataFrame Column with None Value in Python
- groupBy & Sort PySpark DataFrame in Descending Order
- Import PySpark in Python Shell
- Python Programming Tutorials
Summary: This post has shown you how to display a PySpark DataFrame in the table format in the Python programming language. You can leave a comment below if you have any additional questions.
This article was written in collaboration with Gottumukkala Sravan Kumar. You may find more information about Gottumukkala Sravan Kumar and his other articles on his profile page.