Convert PySpark DataFrame Column from String to Double Type (5 Examples)
This tutorial demonstrates how to convert a PySpark DataFrame column from string to double type in the Python programming language.
The article contains the following topics:
- Introduction
- Creating Example Data
- Example 1: Using double Keyword
- Example 2: Using DoubleType() Method
- Example 3: Using select() Function
- Example 4: Using selectExpr() Method
- Example 5: Using SQL
- Video, Further Resources & Summary
Let’s get started…
Introduction
PySpark is an open-source software that is used to store and process data by using the Python Programming language.
We can create a PySpark object by using a Spark session and specify the app name by using the getorcreate() method.
SparkSession.builder.appName(app_name).getOrCreate() |
SparkSession.builder.appName(app_name).getOrCreate()
After creating the data with a list of dictionaries, we have to pass the data to the createDataFrame() method. This will generate our PySpark DataFrame.
spark.createDataFrame(data) |
spark.createDataFrame(data)
Next, we can display the DataFrame by using the show() method:
dataframe.show() |
dataframe.show()
Creating Example Data
In this case, we are going to create a DataFrame from a list of dictionaries with eight rows and three columns, containing details from fruits and cities. We are displaying the DataFrame by using the show() method:
# import the pyspark module import pyspark # import the sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and then give the app name spark = SparkSession.builder.appName('statistics_globe').getOrCreate() #create a dictionary with 3 pairs with 8 values each #inside a list data = [{'fruit': 'apple', 'cost': '67.89', 'city': 'patna'}, {'fruit': 'mango', 'cost': '87.67', 'city': 'delhi'}, {'fruit': 'apple', 'cost': '64.76', 'city': 'harayana'}, {'fruit': 'banana', 'cost': '87.00', 'city': 'hyderabad'}, {'fruit': 'guava', 'cost': '69.56', 'city': 'delhi'}, {'fruit': 'mango', 'cost': '234.67', 'city': 'patna'}, {'fruit': 'apple', 'cost': '143.00', 'city': 'delhi'}, {'fruit': 'mango', 'cost': '49.0', 'city': 'banglore'}] # creating a dataframe from the given list of dictionary dataframe = spark.createDataFrame(data) # display the final dataframe dataframe.show() |
# import the pyspark module import pyspark # import the sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and then give the app name spark = SparkSession.builder.appName('statistics_globe').getOrCreate() #create a dictionary with 3 pairs with 8 values each #inside a list data = [{'fruit': 'apple', 'cost': '67.89', 'city': 'patna'}, {'fruit': 'mango', 'cost': '87.67', 'city': 'delhi'}, {'fruit': 'apple', 'cost': '64.76', 'city': 'harayana'}, {'fruit': 'banana', 'cost': '87.00', 'city': 'hyderabad'}, {'fruit': 'guava', 'cost': '69.56', 'city': 'delhi'}, {'fruit': 'mango', 'cost': '234.67', 'city': 'patna'}, {'fruit': 'apple', 'cost': '143.00', 'city': 'delhi'}, {'fruit': 'mango', 'cost': '49.0', 'city': 'banglore'}] # creating a dataframe from the given list of dictionary dataframe = spark.createDataFrame(data) # display the final dataframe dataframe.show()
The table above shows our example DataFrame. As you can see, it’s containing three columns which are called city, cost, and fruit with string data types.
Let’s convert the string type of the cost column to a double data type.
Example 1: Using Double keyword
This example uses the double keyword with the cast() function to convert the string type into a double type.
We can display our DataFrame columns by using the printSchema() method.
dataframe.withColumn("column_name",dataframe.cost.cast('double')).printSchema() |
dataframe.withColumn("column_name",dataframe.cost.cast('double')).printSchema()
In this example, we are converting the cost column in our DataFrame from string type to double type:
#convert the city column data type into double using double keyword dataframe.withColumn("cost",dataframe.cost.cast('double')).printSchema() |
#convert the city column data type into double using double keyword dataframe.withColumn("cost",dataframe.cost.cast('double')).printSchema()
Example 2: Using DoubleType() Method
This example uses the DoubleType() method imported from pyspark.sql.functions with the cast() function and converts the string type into a double type.
We are displaying the DataFrame columns by using the printSchema() method:
dataframe.withColumn("column_name",dataframe.column_name.cast(DoubleType())).printSchema() |
dataframe.withColumn("column_name",dataframe.column_name.cast(DoubleType())).printSchema()
In this example, we are changing the cost column in our DataFrame from string type to double type.
#import DoubleType method from pyspark.sql.types import DoubleType #convert string to double for cost column dataframe.withColumn("cost",dataframe.cost.cast(DoubleType())).printSchema() |
#import DoubleType method from pyspark.sql.types import DoubleType #convert string to double for cost column dataframe.withColumn("cost",dataframe.cost.cast(DoubleType())).printSchema()
Example 3: Using select() Function
This example uses the select() function with the col() method imported from pyspark.sql.functions by the cast() function and converts the string type into a double type.
We can show the DataFrame columns by using the printSchema() method:
dataframe.select(col("column_name").cast('double').alias("column_name")).printSchema() |
dataframe.select(col("column_name").cast('double').alias("column_name")).printSchema()
In this example, we are converting the cost column in our DataFrame from string type to double.
#import col from pyspark.sql.functions import col # Use select function to convert cost column data type to double. dataframe.select(col("cost").cast('double').alias("cost")).printSchema() |
#import col from pyspark.sql.functions import col # Use select function to convert cost column data type to double. dataframe.select(col("cost").cast('double').alias("cost")).printSchema()
Example 4: Using selectExpr() Method
This example uses the selectExpr() function to switch from string to double type.
dataframe.selectExpr("column_name","cast(column_name as double) column_name") |
dataframe.selectExpr("column_name","cast(column_name as double) column_name")
Once again, we are converting the cost column in our DataFrame from string type to double.
# use select expression to convert string to double data type of cost column. dataframe.selectExpr("city","cast(cost as double) cost") |
# use select expression to convert string to double data type of cost column. dataframe.selectExpr("city","cast(cost as double) cost")
Example 5: Using SQL
This example uses a SQL query to convert a string to a double data type with:
spark.sql("SELECT DOUBLE(column_name) as column_name from view_name") |
spark.sql("SELECT DOUBLE(column_name) as column_name from view_name")
In this example, we are converting the cost column in our DataFrame from string type to double type.
#create view dataframe.createOrReplaceTempView("data") # use sql function to convert string to double data type of cost column spark.sql("SELECT DOUBLE(cost) as cost from data") |
#create view dataframe.createOrReplaceTempView("data") # use sql function to convert string to double data type of cost column spark.sql("SELECT DOUBLE(cost) as cost from data")
Video, Further Resources & Summary
Are you searching for more explanations on how to convert data types in Python, then you may have a look at the following YouTube video of the LearnLinuxTV YouTube channel:

By loading the video, you agree to YouTube’s privacy policy.
Learn more
Furthermore, you may have a look at some other articles on this website:
- Add New Column to PySpark DataFrame in Python
- Change Column Names of PySpark DataFrame in Python
- Concatenate Two & Multiple PySpark DataFrames
- Convert PySpark DataFrame Column from String to Int Type
- Display PySpark DataFrame in Table Format
- Export PySpark DataFrame as CSV
- Filter PySpark DataFrame Column with None Value in Python
- groupBy & Sort PySpark DataFrame in Descending Order
- Import PySpark in Python Shell
- Python Programming Tutorials
This post has illustrated how to set a string to double type in a PySpark DataFrame in the Python programming language. In case you have any additional questions, you may leave a comment below.
This article was written in collaboration with Gottumukkala Sravan Kumar. You may find more information about Gottumukkala Sravan Kumar and his other articles on his profile page.