(Part II) The use of various Machine Learning models for predicting Alzheimer’s disease
- Latera Tesfaye
- Dec 27, 2024
- 2 min read
Model 3: kNN is framed as a non-parametric model for the probabilities:
It models k neighbors estimates this probability as:
Essentially, the probability of each class g is the proportion of the k neighbors of x with that class, g. Then to create a classifier we use:
In this work since we are focused on binary outcome, the classifier can be re-written as:
To avoid large numbers impacting other variables distance measurement, we will scale the predictors (mean zero and standard deviation of 1). For selecting which k value to use, we will try many options and select the one with lowest error rate and smaller chance of overfitting. For k values (from 1 to 20) kNN model was trained. Looking at figure 1, k values from 8 to 17 can be used as the difference between test and train accuracy is small and they have the highest test accuracy. k = 10, 11, 12, 13, 14 has the highest test accuracy, which is also close to train with a tendency to underfit.

We have also extended the above test beyond k = 20 and found out the accuracy started to decrease after k = 18.

As shown in figure 2, the dotted orange line represents the smallest observed test classification error rate. We can see two lowest values, k = 10 and k = 13. But again, looking back to figure 1, k = 13 is less likely to underfit (based on the difference between test and train accuracy). We can fit our knn model using k = 13 (however, any k value from 8 to 17, will produce very close prediction accuracy). Accordingly with the new value, the test accuracy is estimated to be 95% and the AUC 94%.
In the above model, we used the knn model from the class library.
Recap of all the models so far:
Table 1. Summary of all the models so far
Model | Test Accuracy | AUC | Sensitivity | Specificity | Remark |
Simple logistic regression | 46% | 100% | Low | Low | Overfits and bad model for this problem |
Elastic net (glmnet) | 97% | 96% | 100% | 90% | Slightly underfits |
kNN | 95% | 94% | 99% | 83% | Slightly underfits |
Go back to Part I and Part III is coming
Comments