Get duplicated index values - useful when debugging stuff like ValueError: cannot reindex from a duplicate axis

df[df.index.duplicated()]

– via StackOverflow

Drop duplicated index values

df = df[~df.index.duplicated(keep='first')]

– via StackOverflow

Display a correlation matrix using pandas

rs = np.random.RandomState(0)
df = pd.DataFrame(rs.rand(10, 10))

corr = df.corr()

# change the color map
corr.style.background_gradient(cmap='coolwarm')

# ..and only display two decimals
corr.style.background_gradient(cmap='coolwarm').set_precision(2)

# compute the colors based on the entire matrix and not per column or per row
corr.style.background_gradient(cmap='coolwarm', axis=None)

– via StackOverflow

Calculate the difference in months between two dates

df['car-age-in-months'] = (df['date-of-visit'].dt.year - df['date-bought-car'].dt.year) * 12 + 
    (df['date-of-visit'].dt.month - df['date-bought-car'].dt.month)

It’s messy I know, if you find a cleaner way to do this ping me.

Create a datetime Series from year/month numeric columns. We have two main options here:

  1. Use predefined column names - at a minimum you need year, month, and day. You can also add hour, minute, second, etc.
df['date'] = pd.to_datetime(df[['year', 'month', 'day', 'hour', 'minute']])
  1. ⭐️ Use a dict and avoid the need to have predefined column names
df['date'] = pd.to_datetime(dict(year=df['y'], month=df['m'], day=1))

– via StackOverflow and pandas API reference

Return value counts for numpy array

np.unique(my_array, return_counts=True)

– via StackOverflow