Drop and Replacing NaN values

In a previous post I talked about the need for replacing NaN values in my data in preparation for some machine learning models. Some other methods that I have used include dropping or filling them in with something else.

drop the values

Sometimes the easiest thing to do is drop the values, if my data is large enough.

Dropping the rows or dropping the columns

# Drop rows with NaN values
df_cleaned = df.dropna()

# Drop columns with NaN values in-place
df.dropna(axis=1, inplace=True)

replacing values

I had a couple datasets that were small and I did not want to drop the rows of data.

Instead I utilized the .fillna() function to replace the NaN value with a number. Maybe replacing it with 0, or replacing it with the column mean. Another option is using interpolation. Options depend on what makes sense for the dataset.

# Replace NaN values with column mean
df_filled = df.fillna(df.mean())

# Replace NaN values with zero
df_filled = df.fillna(0)