Convert PySpark DataFrame Column from String to Double Type (5 Examples)

This tutorial demonstrates how to convert a PySpark DataFrame column from string to double type in the Python programming language.

The article contains the following topics:

  • Introduction
  • Creating Example Data
  • Example 1: Using double Keyword
  • Example 2: Using DoubleType() Method
  • Example 3: Using select() Function
  • Example 4: Using selectExpr() Method
  • Example 5: Using SQL
  • Video, Further Resources & Summary

Let’s get started…

Introduction

PySpark is an open-source software that is used to store and process data by using the Python Programming language.

We can create a PySpark object by using a Spark session and specify the app name by using the getorcreate() method.

SparkSession.builder.appName(app_name).getOrCreate()

After creating the data with a list of dictionaries, we have to pass the data to the createDataFrame() method. This will generate our PySpark DataFrame.

spark.createDataFrame(data)

Next, we can display the DataFrame by using the show() method:

dataframe.show()

Creating Example Data

In this case, we are going to create a DataFrame from a list of dictionaries with eight rows and three columns, containing details from fruits and cities. We are displaying the DataFrame by using the show() method:

# import the pyspark module
import pyspark
 
# import the sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and then give the app name
spark = SparkSession.builder.appName('statistics_globe').getOrCreate()
 
#create a dictionary with 3 pairs with 8 values each
#inside a list
data = [{'fruit': 'apple', 'cost': '67.89', 'city': 'patna'},
        {'fruit': 'mango', 'cost': '87.67', 'city': 'delhi'},
        {'fruit': 'apple', 'cost': '64.76', 'city': 'harayana'},
        {'fruit': 'banana', 'cost': '87.00', 'city': 'hyderabad'},
        {'fruit': 'guava', 'cost': '69.56', 'city': 'delhi'},
        {'fruit': 'mango', 'cost': '234.67', 'city': 'patna'},
        {'fruit': 'apple', 'cost': '143.00', 'city': 'delhi'},
        {'fruit': 'mango', 'cost': '49.0', 'city': 'banglore'}]
 
 
# creating a dataframe  from the given list of dictionary
dataframe = spark.createDataFrame(data)
 
# display the final dataframe
dataframe.show()

change String to Double Type Table 1

The table above shows our example DataFrame. As you can see, it’s containing three columns which are called city, cost, and fruit with string data types.

Let’s convert the string type of the cost column to a double data type.

Example 1: Using Double keyword

This example uses the double keyword with the cast() function to convert the string type into a double type.

We can display our DataFrame columns by using the printSchema() method.

dataframe.withColumn("column_name",dataframe.cost.cast('double')).printSchema()

In this example, we are converting the cost column in our DataFrame from string type to double type:

 
 
#convert the city column data type into double using double keyword
dataframe.withColumn("cost",dataframe.cost.cast('double')).printSchema()

change String to Double Type Table 2

Example 2: Using DoubleType() Method

This example uses the DoubleType() method imported from pyspark.sql.functions with the cast() function and converts the string type into a double type.

We are displaying the DataFrame columns by using the printSchema() method:

dataframe.withColumn("column_name",dataframe.column_name.cast(DoubleType())).printSchema()

In this example, we are changing the cost column in our DataFrame from string type to double type.

#import DoubleType method
from pyspark.sql.types import DoubleType
 
#convert string to double for cost column
dataframe.withColumn("cost",dataframe.cost.cast(DoubleType())).printSchema()

change String to Double Type Table 2

Example 3: Using select() Function

This example uses the select() function with the col() method imported from pyspark.sql.functions by the cast() function and converts the string type into a double type.

We can show the DataFrame columns by using the printSchema() method:

dataframe.select(col("column_name").cast('double').alias("column_name")).printSchema()

In this example, we are converting the cost column in our DataFrame from string type to double.

#import col
from pyspark.sql.functions import col
 
# Use select function to convert cost column data type to double.
dataframe.select(col("cost").cast('double').alias("cost")).printSchema()

change String to Double Type Table 3

Example 4: Using selectExpr() Method

This example uses the selectExpr() function to switch from string to double type.

dataframe.selectExpr("column_name","cast(column_name as double) column_name")

Once again, we are converting the cost column in our DataFrame from string type to double.

# use select expression to convert string to double data type of cost column.
dataframe.selectExpr("city","cast(cost as double) cost")

change String to Double Type Table 4

Example 5: Using SQL

This example uses a SQL query to convert a string to a double data type with:

spark.sql("SELECT DOUBLE(column_name) as column_name from view_name")

In this example, we are converting the cost column in our DataFrame from string type to double type.

 
#create view
dataframe.createOrReplaceTempView("data")
 
# use sql function to convert string to double data type of cost column
spark.sql("SELECT DOUBLE(cost) as cost from data")

change String to Double Type Table 5

Video, Further Resources & Summary

Are you searching for more explanations on how to convert data types in Python, then you may have a look at the following YouTube video of the LearnLinuxTV YouTube channel:

YouTube

By loading the video, you agree to YouTube’s privacy policy.
Learn more

Load video

Furthermore, you may have a look at some other articles on this website:

This post has illustrated how to set a string to double type in a PySpark DataFrame in the Python programming language. In case you have any additional questions, you may leave a comment below.

This article was written in collaboration with Gottumukkala Sravan Kumar. You may find more information about Gottumukkala Sravan Kumar and his other articles on his profile page.

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.
You need to agree with the terms to proceed

Menu
Top