Logo

Extra Help

 

Dealing Missing Data using Python pandas

If you use Python pandas, missing data is just a distant memory.

Data cleaning tends to take a considerable amount of time in data science, and missing data is among the most common challenges.

Pandas is a useful Python data manipulation tool which, along with other things, helps you fix missing values throughout your dataset.

You can replace the missing data by removing it or filling with other values.

 

extra203

Figure E.1. Sample illustration, missing values are some puzzles to solve before proceeding to any operation.

 

In this blog entry, we'll explain as well as explore various methods for filling missing data with pandas.

Use fillna() method :

The fillna() function evaluates through the dataset, filling whatever null rows with a value users specify. It receives a few optional arguments, including the following:

That's the valuation you would like to incorporate into the empty rows.

Method: Allows users to fill in incomplete data forward or backward. It tends to take either a 'bfill' or a 'ffill' criterion.

 

extras212

Figure E.2. Sample illustration, detecting missing values may prove to be bit more difficult.

 

This receives a conditional statement in position. If True, it indefinitely adjusts the DataFrame. It doesn't alternatively.

Within python script, we'll create a dummy DataFrame and insert Nan values into rows.

 

 

Codeblock E.1. A Dataframe with Nan values.

 

We can fill the 'Nan' or 'None' values with some methods.

We can replace these values using mean(), median() or mode() were the column values consist of numerical values.

 


# Apply mean values in the place of Nan.

df.fillna(df.mean().round(1)),inplace=True)

# Apply median values in the place of Nan.

df.fillna(df.median().round(1),inplace=True)

# Apply mode values in the place of Nan.

df.fillna(df.mode().round(1),inplace=True)

Table E.1. Demonstration of fillna function.

 

 

 

 

Codeblock E.2. Inserting mean,meadian and mode values in place of empty places.

 

Above code block demonstrates the mean,median and mode applied to dataframe 'dataf'. You can also only fill values for only one column here.

 

 

Codeblock E.3. Inserting into empty values for 'Rows 3'.

 

Using ffill, fill null rows with values. This entails specifying the fill method as the fillna() function within it.

 This method replaces each missing row with the value from the row above it.

This could also be called forward-filling:

 

 

Codeblock E.4. Insering into blank values with ffill as method parameter.

 

We can also go column wise for each type of filling missing values. If you want to insert the mean, median, or mode in a specific column:

 

 

Codeblock E.5. Applying mean,median and mode to specific columns.

 

Fill in the missing with values. Utilizing bfill In this case, you'll replace the ffill method with bfill.

It replaces the nearest value below each missing row in the DataFrame.

This This is known as backward-filling:

 

 

Codeblock E.6. Insering into blank values with bfill as method parameter.

 

You can also fill values by looping one by one through the columns, below is a demonstration for the same.

 

 

Codeblock E.7. Insering into blank values by looping through the column.

 

The Replace method() can also be employeed to fill in missing values. Replacing null rows with the mean, median, or mode of a named column.

 

 

Codeblock E.8. Insering into blank values with replace method.

 

We could also use the interpolate() function to fill in the blanks values linearly.

The dataframe.interpolate() function in Pandas is mainly used to fill NA values in a dataframe or series. However, it's a very potent function for filling in the blanks.

We can also use the interpolate() function to interpolate missing values in the reverse direction using a linear method and a maximum number of consecutive Na values that can be filled.

 

 

Codeblock E.9. Insering into blank values unsing interpolate method.

 

---- Summary ----

As of now you know how to insert values into blank spaces in dataframe using several methods.

  • Using fillna method.

  • implementing fillna with ffill.

  • Implementing fillna with bfill.

  • Implementing interpolate method.

  • Implementing replace method.

  • etc...

 

 


________________________________________________________________________________________________________________________________
Footer
________________________________________________________________________________________________________________________________

Copyright © 2022-2023. Anoop Johny. All Rights Reserved.