Mean Imputation of Columns in pandas DataFrame in Python (Example Code)

On this page, I’ll show how to impute NaN values by the mean of a pandas DataFrame column in Python programming.

Setting up the Example

import pandas as pd                                        # Import pandas library

my_df = pd.DataFrame({'A':[5, 7, 1, 2, float('NaN'), 7],  # Construct example DataFrame
                      'B':[1, 1, 1, float('NaN'), float('NaN'), 1]})
print(my_df)                                              # Display example DataFrame in console
#      A    B
# 0  5.0  1.0
# 1  7.0  1.0
# 2  1.0  1.0
# 3  2.0  NaN
# 4  NaN  NaN
# 5  7.0  1.0

Example: Replacing NaN Values by Mean of Variable

my_df = my_df.fillna(my_df.mean())                        # Mean substitution
print(my_df)                                              # Display updated data in console
#      A    B
# 0  5.0  1.0
# 1  7.0  1.0
# 2  1.0  1.0
# 3  2.0  1.0
# 4  4.4  1.0
# 5  7.0  1.0

Leave a Reply Cancel reply

Menu

Top