Remove Duplicate Rows in pandas DataFrame in Python (Example Code)
In this tutorial you’ll learn how to remove duplicate rows from a pandas DataFrame in the Python programming language.
Creation of Example Data
import pandas as pd # Import pandas |
import pandas as pd # Import pandas
my_df = pd.DataFrame({'A':[5, 5, 5, 1, 2, 8], # Construct example DataFrame in Python 'B':[5, 5, 1, 8, 9, 2], 'C':['a', 'a', 'c', 'd', 'e', 'f']}) print(my_df) # Display example DataFrame in console # A B C # 0 5 5 a # 1 5 5 a # 2 5 1 c # 3 1 8 d # 4 2 9 e # 5 8 2 f |
my_df = pd.DataFrame({'A':[5, 5, 5, 1, 2, 8], # Construct example DataFrame in Python 'B':[5, 5, 1, 8, 9, 2], 'C':['a', 'a', 'c', 'd', 'e', 'f']}) print(my_df) # Display example DataFrame in console # A B C # 0 5 5 a # 1 5 5 a # 2 5 1 c # 3 1 8 d # 4 2 9 e # 5 8 2 f
Example: Removing Duplicate Rows in pandas DataFrame Using drop_duplicates() Function
my_df = my_df.drop_duplicates() # Drop duplicates print(my_df) # Display updated DataFrame # A B C # 0 5 5 a # 2 5 1 c # 3 1 8 d # 4 2 9 e # 5 8 2 f |
my_df = my_df.drop_duplicates() # Drop duplicates print(my_df) # Display updated DataFrame # A B C # 0 5 5 a # 2 5 1 c # 3 1 8 d # 4 2 9 e # 5 8 2 f