Calculate Standard Deviation by Group in Python (2 Examples)
In this tutorial, I’ll illustrate how to calculate the standard deviation by group in the Python programming language.
Setting up the Examples
import pandas as pd # Import pandas library |
import pandas as pd # Import pandas library
my_df = pd.DataFrame({'A':range(16, 28), # Constructing a pandas DataFrame 'B':[6, 7, 3, 9, 2, 4, 10, 1, 3, 8, 8, 9], 'C':range(20, 8, - 1), 'GRP_a':['gr1', 'gr1', 'gr2', 'gr3', 'gr1', 'gr2', 'gr3', 'gr1', 'gr2', 'gr2', 'gr3', 'gr3'], 'GRP_b':['a', 'b', 'c', 'a', 'b', 'c', 'c', 'a', 'b', 'b', 'a', 'a']}) print(my_df) # A B C GRP_a GRP_b # 0 16 6 20 gr1 a # 1 17 7 19 gr1 b # 2 18 3 18 gr2 c # 3 19 9 17 gr3 a # 4 20 2 16 gr1 b # 5 21 4 15 gr2 c # 6 22 10 14 gr3 c # 7 23 1 13 gr1 a # 8 24 3 12 gr2 b # 9 25 8 11 gr2 b # 10 26 8 10 gr3 a # 11 27 9 9 gr3 a |
my_df = pd.DataFrame({'A':range(16, 28), # Constructing a pandas DataFrame 'B':[6, 7, 3, 9, 2, 4, 10, 1, 3, 8, 8, 9], 'C':range(20, 8, - 1), 'GRP_a':['gr1', 'gr1', 'gr2', 'gr3', 'gr1', 'gr2', 'gr3', 'gr1', 'gr2', 'gr2', 'gr3', 'gr3'], 'GRP_b':['a', 'b', 'c', 'a', 'b', 'c', 'c', 'a', 'b', 'b', 'a', 'a']}) print(my_df) # A B C GRP_a GRP_b # 0 16 6 20 gr1 a # 1 17 7 19 gr1 b # 2 18 3 18 gr2 c # 3 19 9 17 gr3 a # 4 20 2 16 gr1 b # 5 21 4 15 gr2 c # 6 22 10 14 gr3 c # 7 23 1 13 gr1 a # 8 24 3 12 gr2 b # 9 25 8 11 gr2 b # 10 26 8 10 gr3 a # 11 27 9 9 gr3 a
Example 1: Calculating Standard Deviation by Group in Python
print(my_df.groupby('GRP_a').std()) # Computing the column standard deviation by group # A B C # GRP_a # gr1 3.162278 2.943920 3.162278 # gr2 3.162278 2.380476 3.162278 # gr3 3.696846 0.816497 3.696846 |
print(my_df.groupby('GRP_a').std()) # Computing the column standard deviation by group # A B C # GRP_a # gr1 3.162278 2.943920 3.162278 # gr2 3.162278 2.380476 3.162278 # gr3 3.696846 0.816497 3.696846
Example 2: Calculating Standard Deviation by Group & Subgroup in Python
print(my_df.groupby(['GRP_a', 'GRP_b']).std()) # Computing the column standard deviation by multiple groups # A B C # GRP_a GRP_b # gr1 a 4.949747 3.535534 4.949747 # b 2.121320 3.535534 2.121320 # gr2 b 0.707107 3.535534 0.707107 # c 2.121320 0.707107 2.121320 # gr3 a 4.358899 0.577350 4.358899 # c NaN NaN NaN |
print(my_df.groupby(['GRP_a', 'GRP_b']).std()) # Computing the column standard deviation by multiple groups # A B C # GRP_a GRP_b # gr1 a 4.949747 3.535534 4.949747 # b 2.121320 3.535534 2.121320 # gr2 b 0.707107 3.535534 0.707107 # c 2.121320 0.707107 2.121320 # gr3 a 4.358899 0.577350 4.358899 # c NaN NaN NaN