Box and Whisker Plots

I’m going to share what I learnt about Box and Whisker Plots. Also known as Box Plot is a visual representation of variation in a set of data.

It primarily shows five-number summary such as minimum, first quartile, median, third quartile, and maximum.

In ML world, this visualization provides a good understanding of distribution/variation of data. Hence, understanding this plot is very useful in the ML journey.

For instance, I was working on Iris dataset. If I describe this dataset, this is how it looks like:

Do note that, the parameters from min to max including 25%, 50%, and 75% are represented graphically in Box Plots. The resulting Box and Whisker plot looks like this:

If you observe, for sepal-width, there are some small circles. They are called Outliers.

This video in Khan Academy beautifully explains about interpreting box plots. This video in YouTube explains outliers.


