Search

(Part II) The use of various Machine Learning models for predicting Alzheimer’s disease

Latera Tesfaye
Dec 27, 2024
2 min read

Model 3: kNN is framed as a non-parametric model for the probabilities:

It models k neighbors estimates this probability as:

Essentially, the probability of each class g is the proportion of the k neighbors of x with that class, g. Then to create a classifier we use:

In this work since we are focused on binary outcome, the classifier can be re-written as:

To avoid large numbers impacting other variables distance measurement, we will scale the predictors (mean zero and standard deviation of 1). For selecting which k value to use, we will try many options and select the one with lowest error rate and smaller chance of overfitting. For k values (from 1 to 20) kNN model was trained. Looking at figure 1, k values from 8 to 17 can be used as the difference between test and train accuracy is small and they have the highest test accuracy. k = 10, 11, 12, 13, 14 has the highest test accuracy, which is also close to train with a tendency to underfit.

We have also extended the above test beyond k = 20 and found out the accuracy started to decrease after k = 18.

Figure 2: Classification error across various k values.

As shown in figure 2, the dotted orange line represents the smallest observed test classification error rate. We can see two lowest values, k = 10 and k = 13. But again, looking back to figure 1, k = 13 is less likely to underfit (based on the difference between test and train accuracy). We can fit our knn model using k = 13 (however, any k value from 8 to 17, will produce very close prediction accuracy). Accordingly with the new value, the test accuracy is estimated to be 95% and the AUC 94%.

In the above model, we used the knn model from the class library.

Recap of all the models so far:

Table 1. Summary of all the models so far

Model	Test Accuracy	AUC	Sensitivity	Specificity	Remark
Simple logistic regression	46%	100%	Low	Low	Overfits and bad model for this problem
Elastic net (glmnet)	97%	96%	100%	90%	Slightly underfits
kNN	95%	94%	99%	83%	Slightly underfits