Binary Classification with Python

In the previous two examples, I worked on Multi-Class Classification problems, wherein the class (output) can be one of the multiple values.

In this example, I took up a Binary Classification problem where the output is either 1 or 0.

The approach is similar to previous two examples, so I would only highlight on important points.

For the Binary Classification program, the data set taken is Pima Indians Diabetes Database. In this data set, there are 8 features:

  1. Number of times pregnant
  2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
  3. Diastolic blood pressure (mm Hg)
  4. Triceps skin fold thickness (mm)
  5. 2-Hour serum insulin (mu U/ml)
  6. Body mass index (weight in kg/(height in m)^2)
  7. Diabetes pedigree function
  8. Age (years)

The output is Class variable with values 0 or 1. Class 1 represent tested positive for diabetes. There are 768 rows in this data set.

The accuracy for different set of algorithms for this classification looks like this:

As can be seen, Logistic Regression and Linear Discriminant Analysis shows a better accuracy compared to others.

If I plot it as a Whisker and Box,

For final validation, first I used Logistic Regression and this is the result:

So the accuracy stands at ~ 78%.

When I used Linear Discriminant Analysis, the result looks like this:


As expected, there is no difference in accuracy compared to Logistic Regression.

The summary of this exercise is - for this Classification Problem, either Logistic Regression or Linear Discriminant Analysis seems to be the better algorithms. The solution approach is similar to what I followed for Multi-Class Classification problem.

The Source Code is available in Github.

In this exercise, there was one more learning which I will share in my next post.

Comments

Popular posts from this blog

Ordinal Encoder, OneHotEncoder and LabelBinarizer in Python

Natural Language Toolkit (NLTK)

Data Visualization using Pandas - Univariate Plots