HANDLING MISSING VALUES

Ratri Oktaviani
2 min readMay 1, 2022

How can we deal with missing value?

In reality, we often have missing values in our dataset. How can we deal with it? There are several ways to solve missing value problems. In this case, we would like to use remove rows/columns and fill the missing data with value. The data we’re going to use can be retrieved here.

1st Method — Remove rows/columns

  • Load the dataset
import pandas as pd
import numpy as np
df = pd.read_csv('car_sales.csv')
df.shape
Picture 1 — Dimension of Car Sales Dataset

The dataset to be used is the car sales dataset. The dataset consist of 157 rows and 16 columns. The columns of dataset are manufacturer, model, sales in thousands, year resale value, vehicle type, price in thousands, engine size, horsepower, wheelbase, width, length, curb weight, fuel capacity, fuel efficiency, latest launch and power performance factor.

df.isnull().sum()
Picture 2 — Columns with missing value

Based on output in Picture 2, there are 36 data missing in __year_resale_value column, 2 data missing in Price_in_thousands column, etc. Then, we would like to remove missing value in the dataset by rows.

df_dropna_row = df.dropna(axis=0)
df_dropna_row.info()
Picture 3 — Info dataset after remove missing values by rows

Now, there are no missing values in the rows in each columns. Dimension of dataset to be 117 rows and 16 columns. Furthermore, we would like to remove missing values in the dataset by columns.

df_dropna_col = df.dropna(axis=1)
df_dropna_col.info()
Picture 4 — Info dataset after remove missing values by columns

As we can see in Picture 4, columns with missing value is removed. After remove missing value by columns, the dimension become 157 rows and 5 columns.

2nd Method — Filling the missing data with value

In the second method, we didn’t remove the missing value but fill the missing value with mean and median.

fillna_mean = df.fillna(df.mean())
fillna_mean.info()
Picture 5 — Info dataset after fill missing values with mean
fillna_mean = df.fillna(df.median())
fillna_mean.info()
Picture 6 — Info dataset after fill missing values with median

--

--

Ratri Oktaviani
0 Followers

“Don’t forget! Always be grateful.”