Binary Classification with Python
In the previous two examples, I worked on Multi-Class Classification problems, wherein the class (output) can be one of the multiple values.
In this example, I took up a Binary Classification problem where the output is either 1 or 0.
The approach is similar to previous two examples, so I would only highlight on important points.
For the Binary Classification program, the data set taken is Pima Indians Diabetes Database. In this data set, there are 8 features:
1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
The output is Class variable with values 0 or 1. Class 1 represent tested positive for diabetes. There are 768 rows in this data set.
The accuracy for different set of algorithms for this classification looks like this:
As can be seen, Logistic Regression and Linear Discriminant Analysis shows a better accuracy compared to others.
If I plot it as a Whisker and Box,
For final validation, first I used Logistic Regression and this is the result:So the accuracy stands at ~ 78%.
When I used Linear Discriminant Analysis, the result looks like this:
As expected, there is no difference in accuracy compared to Logistic Regression.
The summary of this exercise is - for this Classification Problem, either Logistic Regression or Linear Discriminant Analysis seems to be the better algorithms. The solution approach is similar to what I followed for Multi-Class Classification problem.
The Source Code is available in Github.
In this exercise, there was one more learning which I will share in my next post.
Comments