Breast Cancer Classification Program

October 02, 2020

I took another example for Classification problem which is Breast Cancer Diagnostic Data Set. As I have already done 3 examples of Classification program, my objective for this was to go deeper into each line of code. I got to know how to slice an array using NumPy, and also understanding Correlation Matrix. So I'll not again going to explain every step as it follows same steps as done for previous Classification Programs.

I did observe different algorithms from a different perspective and I'm sharing these observations in this post. In previous posts, I was looking at accuracy score of different algorithms and then choose the "best" one (having highest accuracy score) to do the validation and check the final statistics.

In this example, I tried to check validation output for each algorithm and compare the difference. I've provided the details below which is self-explanatory. However a couple of things I would like to highlight are Confusion Matrix and Classification Report. Basically from these one can make out why "Accuracy Score" is less or high.

The complete source code is available in Github.

KNeighborsClassifier

------KNN------
Accuracy Score  0.9385964912280702
Confusion Matrix
 [[70  2]
 [ 5 37]]
Classification Report
               precision    recall  f1-score   support

           B       0.93      0.97      0.95        72
           M       0.95      0.88      0.91        42

    accuracy                           0.94       114
   macro avg       0.94      0.93      0.93       114
weighted avg       0.94      0.94      0.94       114

LogisticRegression

------LR------
Accuracy Score  0.9473684210526315
Confusion Matrix
 [[70  2]
 [ 4 38]]
Classification Report
               precision    recall  f1-score   support

           B       0.95      0.97      0.96        72
           M       0.95      0.90      0.93        42

    accuracy                           0.95       114
   macro avg       0.95      0.94      0.94       114
weighted avg       0.95      0.95      0.95       114

LinearDiscriminantAnalysis

------LDA------
Accuracy Score  0.9473684210526315
Confusion Matrix
 [[72  0]
 [ 6 36]]
Classification Report
               precision    recall  f1-score   support

           B       0.92      1.00      0.96        72
           M       1.00      0.86      0.92        42

    accuracy                           0.95       114
   macro avg       0.96      0.93      0.94       114
weighted avg       0.95      0.95      0.95       114

DecisionTreeClassifier

------DTC------
Accuracy Score  0.956140350877193
Confusion Matrix
 [[70  2]
 [ 3 39]]
Classification Report
               precision    recall  f1-score   support

           B       0.96      0.97      0.97        72
           M       0.95      0.93      0.94        42

    accuracy                           0.96       114
   macro avg       0.96      0.95      0.95       114
weighted avg       0.96      0.96      0.96       114

GaussianNB

------NB------
Accuracy Score  0.9473684210526315
Confusion Matrix
 [[70  2]
 [ 4 38]]
Classification Report
               precision    recall  f1-score   support

           B       0.95      0.97      0.96        72
           M       0.95      0.90      0.93        42

    accuracy                           0.95       114
   macro avg       0.95      0.94      0.94       114
weighted avg       0.95      0.95      0.95       114

SVC

------SVC------
Accuracy Score  0.9035087719298246
Confusion Matrix
 [[72  0]
 [11 31]]
Classification Report
               precision    recall  f1-score   support

           B       0.87      1.00      0.93        72
           M       1.00      0.74      0.85        42

    accuracy                           0.90       114
   macro avg       0.93      0.87      0.89       114
weighted avg       0.92      0.90      0.90       114

The ML Journey of a Developer

Breast Cancer Classification Program

Comments

Popular posts from this blog

Ordinal Encoder, OneHotEncoder and LabelBinarizer in Python

Data Visualization using Pandas - Univariate Plots

Stemming in Python