Box and Whisker Plots

I’m going to share what I learnt about Box and Whisker Plots. Also known as Box Plot is a visual representation of variation in a set of data.

It primarily shows five-number summary such as minimum, first quartile, median, third quartile, and maximum.

In ML world, this visualization provides a good understanding of distribution/variation of data. Hence, understanding this plot is very useful in the ML journey.

For instance, I was working on Iris dataset. If I describe this dataset, this is how it looks like:

Do note that, the parameters from min to max including 25%, 50%, and 75% are represented graphically in Box Plots. The resulting Box and Whisker plot looks like this:

If you observe, for sepal-width, there are some small circles. They are called Outliers.

This video in Khan Academy beautifully explains about interpreting box plots. This video in YouTube explains outliers.


Popular posts from this blog

Ordinal Encoder, OneHotEncoder and LabelBinarizer in Python

Failed to create Anaconda menus

Natural Language Toolkit (NLTK)