Mean Imputation of Columns in pandas DataFrame in Python (Example Code)
On this page, I’ll show how to impute NaN values by the mean of a pandas DataFrame column in Python programming.
Setting up the Example
import pandas as pd # Import pandas library |
import pandas as pd # Import pandas library
my_df = pd.DataFrame({'A':[5, 7, 1, 2, float('NaN'), 7], # Construct example DataFrame 'B':[1, 1, 1, float('NaN'), float('NaN'), 1]}) print(my_df) # Display example DataFrame in console # A B # 0 5.0 1.0 # 1 7.0 1.0 # 2 1.0 1.0 # 3 2.0 NaN # 4 NaN NaN # 5 7.0 1.0 |
my_df = pd.DataFrame({'A':[5, 7, 1, 2, float('NaN'), 7], # Construct example DataFrame 'B':[1, 1, 1, float('NaN'), float('NaN'), 1]}) print(my_df) # Display example DataFrame in console # A B # 0 5.0 1.0 # 1 7.0 1.0 # 2 1.0 1.0 # 3 2.0 NaN # 4 NaN NaN # 5 7.0 1.0
Example: Replacing NaN Values by Mean of Variable
my_df = my_df.fillna(my_df.mean()) # Mean substitution print(my_df) # Display updated data in console # A B # 0 5.0 1.0 # 1 7.0 1.0 # 2 1.0 1.0 # 3 2.0 1.0 # 4 4.4 1.0 # 5 7.0 1.0 |
my_df = my_df.fillna(my_df.mean()) # Mean substitution print(my_df) # Display updated data in console # A B # 0 5.0 1.0 # 1 7.0 1.0 # 2 1.0 1.0 # 3 2.0 1.0 # 4 4.4 1.0 # 5 7.0 1.0