Get Variance by Group in Python – pandas DataFrame Subgroups (2 Examples)
This page shows how to calculate the variance by group in Python programming.
Preparing the Examples
import pandas as pd # Import pandas library to Python |
import pandas as pd # Import pandas library to Python
my_df = pd.DataFrame({'A':range(16, 28), # Constructing a pandas DataFrame 'B':[6, 7, 3, 9, 2, 4, 10, 1, 3, 8, 8, 9], 'C':range(20, 8, - 1), 'GRP_a':['gr1', 'gr1', 'gr2', 'gr3', 'gr1', 'gr2', 'gr3', 'gr1', 'gr2', 'gr2', 'gr3', 'gr3'], 'GRP_b':['a', 'b', 'c', 'a', 'b', 'c', 'c', 'a', 'b', 'b', 'a', 'a']}) print(my_df) # A B C GRP_a GRP_b # 0 16 6 20 gr1 a # 1 17 7 19 gr1 b # 2 18 3 18 gr2 c # 3 19 9 17 gr3 a # 4 20 2 16 gr1 b # 5 21 4 15 gr2 c # 6 22 10 14 gr3 c # 7 23 1 13 gr1 a # 8 24 3 12 gr2 b # 9 25 8 11 gr2 b # 10 26 8 10 gr3 a # 11 27 9 9 gr3 a |
my_df = pd.DataFrame({'A':range(16, 28), # Constructing a pandas DataFrame 'B':[6, 7, 3, 9, 2, 4, 10, 1, 3, 8, 8, 9], 'C':range(20, 8, - 1), 'GRP_a':['gr1', 'gr1', 'gr2', 'gr3', 'gr1', 'gr2', 'gr3', 'gr1', 'gr2', 'gr2', 'gr3', 'gr3'], 'GRP_b':['a', 'b', 'c', 'a', 'b', 'c', 'c', 'a', 'b', 'b', 'a', 'a']}) print(my_df) # A B C GRP_a GRP_b # 0 16 6 20 gr1 a # 1 17 7 19 gr1 b # 2 18 3 18 gr2 c # 3 19 9 17 gr3 a # 4 20 2 16 gr1 b # 5 21 4 15 gr2 c # 6 22 10 14 gr3 c # 7 23 1 13 gr1 a # 8 24 3 12 gr2 b # 9 25 8 11 gr2 b # 10 26 8 10 gr3 a # 11 27 9 9 gr3 a
Example 1: Calculating Variance by Group in Python
print(my_df.groupby('GRP_a').var()) # Computing the column variance by group # A B C # GRP_a # gr1 10.000000 8.666667 10.000000 # gr2 10.000000 5.666667 10.000000 # gr3 13.666667 0.666667 13.666667 |
print(my_df.groupby('GRP_a').var()) # Computing the column variance by group # A B C # GRP_a # gr1 10.000000 8.666667 10.000000 # gr2 10.000000 5.666667 10.000000 # gr3 13.666667 0.666667 13.666667
Example 2: Calculating Variance by Group & Subgroup in Python
print(my_df.groupby(['GRP_a', 'GRP_b']).var()) # Computing the column variance by multiple groups # A B C # GRP_a GRP_b # gr1 a 24.5 12.500000 24.5 # b 4.5 12.500000 4.5 # gr2 b 0.5 12.500000 0.5 # c 4.5 0.500000 4.5 # gr3 a 19.0 0.333333 19.0 # c NaN NaN NaN |
print(my_df.groupby(['GRP_a', 'GRP_b']).var()) # Computing the column variance by multiple groups # A B C # GRP_a GRP_b # gr1 a 24.5 12.500000 24.5 # b 4.5 12.500000 4.5 # gr2 b 0.5 12.500000 0.5 # c 4.5 0.500000 4.5 # gr3 a 19.0 0.333333 19.0 # c NaN NaN NaN