How to Import PySpark in Python Shell (3 Examples)
This tutorial demonstrates how to import PySpark into the Python Shell in the Python programming language.
This article contains the following sections:
- Example 1: Importing PySpark
- Example 2: Details of PySpark
- Example 3: Creation of Data
- Video, Further Resources & Summary
Let’s dig in.
Example 1: Importing PySpark
We have to use the pip command to install the PySpark module in the Python shell.
pip install pyspark |
pip install pyspark
Example 2: Details of PySpark
After importing, we can check the PySpark details. For this, we have to use the show keyword after the pip command. It will display the PySpark name, the current version, etc…
pip show pyspark |
pip show pyspark
Example 3:Creation of Data
We can create a PySpark object by using a Spark session and specify the app name by using the getorcreate() method.
SparkSession.builder.appName(app_name).getOrCreate() |
SparkSession.builder.appName(app_name).getOrCreate()
After creating the data with a list of dictionaries, we have to pass the data to the createDataFrame() method. This will create our PySpark DataFrame.
spark.createDataFrame(data) |
spark.createDataFrame(data)
Next, we can display the DataFrame by using the show() method:
dataframe.show() |
dataframe.show()
In this example, we are going to create a DataFrame from a list of dictionaries with eight rows and three columns, containing fruit and city details. We are displaying the DataFrame by using the show() method:
# import the pyspark module import pyspark # import the sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and then give the app name spark = SparkSession.builder.appName('statistics_globe').getOrCreate() #create a dictionary with 3 pairs with 8 values each #inside a list data = [{'fruit': 'apple', 'cost': '67.89', 'city': 'patna'}, {'fruit': 'mango', 'cost': '87.67', 'city': 'delhi'}, {'fruit': 'apple', 'cost': '64.76', 'city': 'harayana'}, {'fruit': 'banana', 'cost': '87.00', 'city': 'hyderabad'}, {'fruit': 'guava', 'cost': '69.56', 'city': 'delhi'}, {'fruit': 'mango', 'cost': '234.67', 'city': 'patna'}, {'fruit': 'apple', 'cost': '143.00', 'city': 'delhi'}, {'fruit': 'mango', 'cost': '49.0', 'city': 'banglore'}] # creating a dataframe from the given list of dictionary dataframe = spark.createDataFrame(data) # display the final dataframe dataframe.show() |
# import the pyspark module import pyspark # import the sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and then give the app name spark = SparkSession.builder.appName('statistics_globe').getOrCreate() #create a dictionary with 3 pairs with 8 values each #inside a list data = [{'fruit': 'apple', 'cost': '67.89', 'city': 'patna'}, {'fruit': 'mango', 'cost': '87.67', 'city': 'delhi'}, {'fruit': 'apple', 'cost': '64.76', 'city': 'harayana'}, {'fruit': 'banana', 'cost': '87.00', 'city': 'hyderabad'}, {'fruit': 'guava', 'cost': '69.56', 'city': 'delhi'}, {'fruit': 'mango', 'cost': '234.67', 'city': 'patna'}, {'fruit': 'apple', 'cost': '143.00', 'city': 'delhi'}, {'fruit': 'mango', 'cost': '49.0', 'city': 'banglore'}] # creating a dataframe from the given list of dictionary dataframe = spark.createDataFrame(data) # display the final dataframe dataframe.show()
Video, Further Resources & Summary
If you need more information on how to import PySpark in the Python Shell, then you may have a look at the following YouTube video of Krish Naik’s YouTube channel.
If you are interested, you may have a look at some other tutorials on the Data Hacks website:
- Add New Column to PySpark DataFrame in Python
- Change Column Names of PySpark DataFrame in Python
- Concatenate Two & Multiple PySpark DataFrames
- Convert PySpark DataFrame Column from String to Double Type
- Convert PySpark DataFrame Column from String to Int Type
- Display PySpark DataFrame in Table Format
- Export PySpark DataFrame as CSV
- Filter PySpark DataFrame Column with None Value in Python
- groupBy & Sort PySpark DataFrame in Descending Order
- Python Programming Tutorials
Summary: This article has explained how to get PySpark into Python Shell in the Python programming language. In case you have any additional questions, you may leave a comment below.
This article was written in collaboration with Gottumukkala Sravan Kumar. You may find more information about Gottumukkala Sravan Kumar and his other articles on his profile page.