top of page
Search

(Part II) The use of various Machine Learning models for predicting Alzheimer’s disease

  • Writer: Latera Tesfaye
    Latera Tesfaye
  • Dec 27, 2024
  • 2 min read

Model 3: kNN is framed as a non-parametric model for the probabilities:

It models k neighbors estimates this probability as:

Essentially, the probability of each class g is the proportion of the k neighbors of x with that class, g. Then to create a classifier we use:

In this work since we are focused on binary outcome, the classifier can be re-written as:

To avoid large numbers impacting other variables distance measurement, we will scale the predictors (mean zero and standard deviation of 1). For selecting which k value to use, we will try many options and select the one with lowest error rate and smaller chance of overfitting. For k values (from 1 to 20) kNN model was trained. Looking at figure 1, k values from 8 to 17 can be used as the difference between test and train accuracy is small and they have the highest test accuracy. k = 10, 11, 12, 13, 14 has the highest test accuracy, which is also close to train with a tendency to underfit.

Figure 1: Test Vs train accuracy
Figure 1: Test Vs train accuracy

We have also extended the above test beyond k = 20 and found out the accuracy started to decrease after k = 18.

Figure 2: Classification error across various k values.
Figure 2: Classification error across various k values.

As shown in figure 2, the dotted orange line represents the smallest observed test classification error rate. We can see two lowest values, k = 10 and k = 13. But again, looking back to figure 1, k = 13 is less likely to underfit (based on the difference between test and train accuracy). We can fit our knn model using k = 13 (however, any k value from 8 to 17, will produce very close prediction accuracy). Accordingly with the new value, the test accuracy is estimated to be 95% and the AUC 94%.


In the above model, we used the knn model from the class library.


Recap of all the models so far:


Table 1. Summary of all the models so far

Model

Test Accuracy

AUC

Sensitivity

Specificity

Remark

Simple logistic regression

46%

100%

Low

Low

Overfits and bad model for this problem

Elastic net (glmnet)

97%

96%

100%

90%

Slightly underfits

kNN

95%

94%

99%

83%

Slightly underfits


Go back to Part I and Part III is coming


 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Featured Posts
Recent Posts
Archive
Search By Tags
Follow Us
  • Facebook Basic Square
  • Twitter Basic Square
  • Google+ Basic Square
bottom of page