Comparing predictive performance of k-nearest neighbors and support vector machine for predicting ischemic heart disease
DOI:
https://doi.org/10.6084/rjas.v1i2.391Keywords:
Ischemic heart disease, prediction, support vector machine, k-nearest neighborsAbstract
This research proposes to compare the predictive performance of k-nearest neighbors (k-NN) and support vector machine (SVM) for predicting ischemic heart disease (IHD), based on its important risk factors. For this study, the information on the risk factors of IHD were collected from 300 individuals. Among them, 100 were recruited from the IHD group and 200 from the control group. Furthermore, the entire data set was randomly partitioned into training and testing set by the ratio of 7:3 respectively. The k-NN and SVM models were fitted on the training data set with 10-fold cross-validation. Both models were evaluated based on their accuracy rate, sensitivity, specificity, and area under the receiver operating characteristics (ROC) curve (AUC) on both training and testing datasets. The results from different evaluation methods revealed that SVM outperformed compared to k-NN with a higher value of accuracy (86.67%), sensitivity (80%), specificity (90%), and AUC (94.1%) on testing data set. However, no statistical significant differences were found between SVM and k-NN. In addition, both models showed that blood pressure, cholesterol, physical activity, BMI, diet, family history, and type of oil are the most important risk factors for increasing the chance of IHD. The results of this study indicated that SVM and k-NN models can be used to develop a predictive system for IHD using its important risk factors.
References
Bano, S., Khan, M. N. A. (2016). A framework to improve diabetes prediction using k-nearest neighbors and support vector machine. International Journal of Computer Science and Information Security (IJCSIS), 14, 450-460.
Barolia, R., Sayani, A. H. (2017). Risk factors of cardiovascular disease and its recommendations in Pakistani context. The Journal of the Pakistan Medical Association. 67, 1723–1729.
Batty, G. D. (2002). Physical activity and coronary heart disease in older adults: a systematic review of epidemiological studies. European Journal of Public Health, 12, 171-176.
Bhatia, S.K. (2010). Biomaterials for Clinical Applications. Springer Science+Business Media.
Cover, T.M., and Hart, P.E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21-27.
Flint, A. J., Rexrode, K. M., Hu, F. B., Glynn, R. J., Caspard, H., Manson, J. E., Willett, W. C., & Rimm, E. B. (2010). Body mass index, waist circumference, and risk of coronary heart disease: a prospective study among men and women. Obesity research & clinical practice, 4 (3), 171-181.
Harikumar Rajaguru, H., & Chakravarthy S. R. S. (2019). Analysis of decision tree and k-nearest neighbors algorithm in the classification of breast cancer. Asian Pacific Journal of Cancer Prevention, 20, 3777-3781.
Iqbal, F., Jafri, Y. Z., Siddiqi, A. R., & Sabir, M. A. (2014). Determining risk factors for ischemic heart disease using logistic regression and classification tree. Sylwan, 158, 69-87.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. New York: Springer.
Joana S., Cardoso, P. J., & Pereira, T. (2017). Supervised learning methods for pathological arterial pulse wave differentiation: a support vector machine and neural networks approach. International Journal of Medical Informatics, 1-23.
Koolhaas, C., Dhana, K., Golubic, R., Schoufour, J., Hofman, A., Rooij, F., & Franco, O. (2016). Physical activity types and coronary heart disease risk in middle-aged and elderly persons: the Rotterdam study. American journal of epidemiology, 183, 729-738.
Lee, I. M., Rexrode, K. M., Cook, N. R., Manson, J. E., & Buring, J. E. (2001). Physical activity and coronary heart disease in women: is “no pain, no gain” passe? JAMA, 285, 1447-54.
Lee, I.M., Howard, D., Sesso, Ralph, S. & Paffenbarger. (2000). Physical activity and coronary heart disease risk in men does the duration of exercise episodes predict risk? American Heart Association, 102, 981-986.
MacMahon, S., Peto, R., Cutler, J., Collins, R., Sorlie, P., Neaton, J., Abbott, R., Godwin, J., Dyer, A., & Stamler, J. (1990). Blood pressure, stroke, and coronary heart disease. Part 1, prolonged differences in blood pressure: prospective observational studies corrected for the regression dilution bias. The Lancet, 335, 765–774.
Murugan, A., Nair, S. A. H., & Kumar, K. P. S. (2019). Detection of skin cancer using support vector machine, random forest and k-nearest neighbors classifiers. Journal of Medical Systems, 269, 1-9.
Peng, C. Y. J., So, T. S. H., Stage, F. K., & St. John, E. P. (2002). The use and interpretation of logistic regression in higher education journals: 1988-1999. Research in Higher Education, 43, 259–293.
Pereira, T., Paiva, J. S., Carlos Correia, C., & Cardoso, J. (2015). An automatic method for arterial pulse waveform recognition using k-nearest neighbors classifiers and support vector machine classifiers.
R Core Team (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. URL http://www.R-project.org/.
Vapnik, V. (1995). Support-Vector Networks. Machine Leaming, 20, 273-297.
Weng, S. F., Reps, J., Kai, J., Garibaldi, J. M., & Qureshi, N. (2017). Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS ONE, 51(3), e0174944.
WHO. (2017). cardiovascular diseases: Key Facts. 2017. http://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
Yusuf, S., Hawken, S., Ôunpuu, S., Dans T., Avezum, A., Lanas, F., McQueen, M., Budaj, A., Pais, P., Varigos, J., Lisheng, L., on behalf of the INTERHEART Study Investigators. (2004). Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. The Lancet, 364, 937– 952.
Zhou, X., Obuchowski, N., & NcClish, D. (2002). Statistical methods in diagnostic medicine. New York: Wiley-Interscience.
Downloads
Published
Issue
Section
Categories
License
This open-access article is distributed under a Creative Commons Attribution (CC-BY-NC-SA) license.
You are free to: Share — copy and redistribute the material in any medium or format.
Adapt — remix, transform, and build upon the material for any purpose, even commercially. The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms: Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.