Descriptive Statistics for Columns of pandas DataFrame in Python (2 Examples)
In this Python article you’ll learn how to calculate descriptive statistics for the columns of a pandas DataFrame.
Setting up the Examples
import pandas as pd # Import pandas library |
import pandas as pd # Import pandas library
my_df = pd.DataFrame({'A':range(1, 9), # Constructing pandas DataFrame 'B':range(10, 18), 'C':[2, 6, 3, 5, 7, 6, 4, 9], 'GRP':['x', 'x', 'y', 'z', 'y', 'x', 'z', 'z']}) print(my_df) # A B C GRP # 0 1 10 2 x # 1 2 11 6 x # 2 3 12 3 y # 3 4 13 5 z # 4 5 14 7 y # 5 6 15 6 x # 6 7 16 4 z # 7 8 17 9 z |
my_df = pd.DataFrame({'A':range(1, 9), # Constructing pandas DataFrame 'B':range(10, 18), 'C':[2, 6, 3, 5, 7, 6, 4, 9], 'GRP':['x', 'x', 'y', 'z', 'y', 'x', 'z', 'z']}) print(my_df) # A B C GRP # 0 1 10 2 x # 1 2 11 6 x # 2 3 12 3 y # 3 4 13 5 z # 4 5 14 7 y # 5 6 15 6 x # 6 7 16 4 z # 7 8 17 9 z
Example 1: Get Descriptive Statistics for Each Column of pandas DataFrame
print(my_df.describe()) # Summary statistics for all column # A B C # count 8.00000 8.00000 8.000000 # mean 4.50000 13.50000 5.250000 # std 2.44949 2.44949 2.251983 # min 1.00000 10.00000 2.000000 # 25% 2.75000 11.75000 3.750000 # 50% 4.50000 13.50000 5.500000 # 75% 6.25000 15.25000 6.250000 # max 8.00000 17.00000 9.000000 |
print(my_df.describe()) # Summary statistics for all column # A B C # count 8.00000 8.00000 8.000000 # mean 4.50000 13.50000 5.250000 # std 2.44949 2.44949 2.251983 # min 1.00000 10.00000 2.000000 # 25% 2.75000 11.75000 3.750000 # 50% 4.50000 13.50000 5.500000 # 75% 6.25000 15.25000 6.250000 # max 8.00000 17.00000 9.000000
Example 2: Get Descriptive Statistics for Each Column of pandas DataFrame by Group
print(my_df.groupby('GRP').describe()) # Get mean by group # A ... C # count mean std min 25% 50% ... std min 25% 50% 75% max # GRP ... # x 3.0 3.000000 2.645751 1.0 1.5 2.0 ... 2.309401 2.0 4.0 6.0 6.0 6.0 # y 2.0 4.000000 1.414214 3.0 3.5 4.0 ... 2.828427 3.0 4.0 5.0 6.0 7.0 # z 3.0 6.333333 2.081666 4.0 5.5 7.0 ... 2.645751 4.0 4.5 5.0 7.0 9.0 |
print(my_df.groupby('GRP').describe()) # Get mean by group # A ... C # count mean std min 25% 50% ... std min 25% 50% 75% max # GRP ... # x 3.0 3.000000 2.645751 1.0 1.5 2.0 ... 2.309401 2.0 4.0 6.0 6.0 6.0 # y 2.0 4.000000 1.414214 3.0 3.5 4.0 ... 2.828427 3.0 4.0 5.0 6.0 7.0 # z 3.0 6.333333 2.081666 4.0 5.5 7.0 ... 2.645751 4.0 4.5 5.0 7.0 9.0
# [3 rows x 24 columns]
Related Tutorials & Further Resources
In the following, you may find some further resources on topics such as data inspection, indices, and variables: