Pandas DataFrame duplicated() Method

❮ DataFrame Reference


Example

Check which rows are duplicated and not:

import pandas as pd

data = {
  "name": ["John", "Mary", "John", "Sally", "Mary"],
  "age": [40, 30, 40, 50, 30],
  "city": ["Bergen", "Oslo", "Stavanger", "Oslo", "Oslo"]
}

df = pd.DataFrame(data)

s = df.duplicated()

print(s)
Try it Yourself »

Definition and Usage

The duplicated() method returns a Series with True and False values that describe which rows in the DataFrame are duplicated and not.

Use the subset parameter to specify which columns to include when looking for duplicates. By default all columns are included.

By default, the first occurrence of two or more duplicates will be set to False.

Set the keep parameter to False to also set the first occurrence to True.


Syntax

dataframe.duplicated(subset, keep)

Parameters

The parameters are keyword arguments.

Parameter Value Description
subset column label(s) Optional. A String, or a list, of the column names to include when looking for duplicates.  Default subset=None (meaning no subset is specified, and all columns should be included.
keep 'first'
'last'
False
Optional, default 'first'. Specifies how to deal with duplicates:
'first' means set the first occurrence to False, the rest to True.
'last' means set the last occurrence to False, the rest to True.
False means set all occurrences to True.

Return Value

A Series with a boolean value for each row in the DataFrame.


More Examples

Example

Only include the columns "name" and "age":

s = df.duplicated(subset=["name", "age"])

print(s)
Try it Yourself »

Example

Set all occurrences of duplicates to True:

s = df.duplicated(keep=False)

print(s)
Try it Yourself »

❮ DataFrame Reference

Copyright 1999-2023 by Refsnes Data. All Rights Reserved.