Feature importance of using explanaible artificial intelligence (xai) and machine learning for diabetes disease classification

Main Article Content

Muhammad Maulana Ahmad
Neny Sulistianingsih
Khasnur Hidjah

Abstract

Diabetes is one of the most significant global health problems in the modern era. This disease not only has a serious impact on the quality of life of sufferers, but also poses a great economic and social burden, both for individuals and the health service system as a whole. Therefore, early detection and effective treatment are very important in an effort to reduce the prevalence and negative impact of this disease. Therefore, the purpose of this study is to design a machine learning classification model that is able to identify feature importance with the help of the Explainable Artificial Intelligence (XAI) method in the case of diabetes. This model is expected to provide a clear interpretation of the most relevant features or symptoms, making it easier to detect whether a person has diabetes or not based on the symptoms that have been selected more optimally. The results of this study in the treatment or prediction of diabetes show that the results of the selection of LIME model features are higher than the accuracy of the SHAP model, where the highest is the LIME model which is processed using classification using the XGBoost algorithm with an accuracy of 98.47%, in addition to the LIME model using the Decisien Tree and Random Forest algorithms producing an accuracy of 91.97% and 91.49%, respectively.  then the SHAP model using the XGBoost algorithm produced an accuracy of 0.9094%, the Decisien Tree algorithm produced an accuracy of 0.8059% and the Random Forest produced an accuracy of 88.46%, with the amount of data used as many as 70000 data, with 80% training data and 20% test data. The findings of this study are that the LIME feature selection combined with the XGBoost classification method has the best accuracy rate of 98.47% compared to the SHAP feature selection which is the same in combination with XGBoost with an accuracy of 90.94%. These findings also show that the selection of LIME features combined with the XGBoost algorithm is able to improve the interpretability of the model as well as maintain or even improve the accuracy of the predictions. This approach allows for the identification of the most relevant features more efficiently, thus supporting more informed decision-making in the data analysis process

Article Details

Section
Articles

References

[1] H. Hairani, A. Anggrawan, and D. Priyanto, “Improvement Performance of the Random Forest Method on Unbalanced Diabetes Data Classification Using Smote-Tomek Link,” Int. J. Informatics Vis., vol. 7, no. 1, pp. 258–264, 2023, doi: 10.30630/joiv.7.1.1069.
[2] A. Brahmandjati, A. M. A. Rahim, and F. Asharudin, "Optimization of Diabetes Prediction with XGBoost Algorithm and Data Preprocessing Techniques," vol. 3, no. 1, pp. 116–125, 2024.
[3] Y. N. Marlim, L. Suryati, and N. Agustina, "Early Detection of Diabetes Using Machine Learning with Logistic Regression Algorithm," vol. 11, no. 2, pp. 88–96, 2022.
[4] Q. R. Cahyani, M. J. Finandi, J. Rianti, D. L. Arianti, and A. D. Pratama, "Diabetes Risk Risk Prediction using Logistic Regression Algorithm," vol. 1, no. 2, pp. 107–114, 2022, doi: 10.55123/jomlai.v1i2.598.
[5] D. C. P. Buani, "Early Detection of Diabetes Using the Random Forest Algorithm," EVOLUTION J. Science and Management., vol. 12, no. 1, pp. 1–8, 2024, doi: 10.31294/evolution.v12i1.21005.
[6] A. W. Mucholladin, F. A. Bachtiar, and M. T. Furqon, "Classification of Diabetic Diseases using the Support Vector Machine Method," J. Pengemb. Technology. Inf. and Computing Science., vol. 5, no. 2, pp. 622–633, 2021, [Online]. Available: http://j-ptiik.ub.ac.id
[7] A. Oktaviana, D. P. Wijaya, A. Pramuntadi, and D. Heksaputra, "Prediction of Type 2 Diabetes Mellitus Using the K-Nearest Neighbor (K-NN) Algorithm," MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 3, pp. 812–818, 2024, doi: 10.57152/malcom.v4i3.1268.
[8] J. Ginting, R. Ginting, and H. Hartono, "Detection and Prediction of Type 2 Diabetes Mellitus Using Machine Learning (Scooping Review)," J. Prior Nursing., vol. 5, no. 2, pp. 93–105, 2022, doi:10.34012/jukep.v5i2.2671.
[9] M. Salsabil, N. Lutvi, and A. Eviyanti, "Implementation of Data Mining in Predicting Diabetes Using Random Forest and Xgboost Methods," J. Ilm. Computing, vol. 23, no. 1, pp. 51–58, 2024, doi: 10.32409/jikstik.23.1.3507.
[10] G. Quellec, H. Al Hajj, M. Lamard, P. H. Conze, P. Massin, and B. Cochener, “ExplAIn: Explanatory artificial intelligence for diabetic retinopathy diagnosis,” Med. Image Anal., vol. 72, no. 2016, 2021, doi: 10.1016/j.media.2021.102118.
[11] Y. Zhao, J. K. Chaw, M. C. Ang, M. M. Daud, and L. Liu, “A Diabetes Prediction Model with Visualized Explainable Artificial Intelligence (XAI) Technology,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 14322 LNCS, no. February 2025, pp. 648–661, 2024, doi: 10.1007/978-981-99-7339-2_52.
[12] A. Car et al., "Covariance Structure Analysis on Health-Related Indicators in Elderly People at Home with a Focus on Subjective Health PerceptionsTitle," Int. J. Technol., vol. 47, no. 1, p. 100950, 2023, [Online]. Available: https://doi.org/10.1016/j.tranpol.2019.01.002%0Ahttps://doi.org/10.1016/j.cstp.2023.100950%0Ahttps://doi.org/10.1016/j.geoforum.2021.04.007%0Ahttps://doi.org/10.1016/j.trd.2021.102816%0Ahttps://doi.org/10.1016/j.tra.2020.03.015%0Ahttps://doi.org/10.1016/j
[13] R. Ganguly and D. Singh, “Explainable Artificial Intelligence (XAI) for the Prediction of Diabetes Management: An Ensemble Approach,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 7, pp. 158–163, 2023, doi: 10.14569/IJACSA.2023.0140717.
[14] I. Uysal, “Interpretable Diabetes Prediction using XAI in Healthcare Application,” J. Multidiscip. Dev., vol. 8, no. 1, pp. 20–38, 2023, [Online]. Available: https://www.researchgate.net/publication/376208551
[15] A. Gramegna and P. Giudici, “SHAP and LIME: An Evaluation of Discriminative Power in Credit Risk,” Forehead. Artif. Intell., vol. 4, no. September, pp. 1–6, 2021, doi: 10.3389/frai.2021.752558.
[16] T. Öznacar and Z. T. Sertkaya, “Heart Failure Prediction: A Comparative Study of SHAP, LIME, and ICE in Machine Learning Models,” Int. J. Comput. Exp. Sci. Eng., vol. 10, no. 4, pp. 1885–1892, 2024, doi: 10.22399/ijcesen.589.
[17] K. Eligibility et al., “Classification of Drinking Water Source Suitability in West Java Using XGBoost and Cluster Analysis Based on SHAP Values *,” vol. 8, no. 2, pp. 202–214, 2024.
[18] A. T. Pratiwi, A. Barizi, M. I. Maulana, and P. Rosyani, "Systematic Literature Review of the Application of Gradient Boosting for the Classification of Type 2 Diabetes Disease," vol. 2, no. 3, pp. 454–458, 2024.
[19] S. Sutrisno and Jupron, "Analysis of Diabetes Classification with Neural Network Algorithm," bit-Tech, vol. 6, no. 3, pp. 303–310, 2024, doi:10.32877/bt.v6i3.1161.
[20] N. Maulidah, R. Supriyadi, D. Y. Utami, F. N. Hasan, A. Fauzi, and A. Christian, "Prediction of Diabetes Mellitus Using Support Vector Machine and Naive Bayes Methods," Indones. J. Softw. Eng., vol. 7, no. 1, pp. 63–68, 2021, doi: 10.31294/ijse.v7i1.10279.
[21] V. Vivek Khanna et al., “Explainable artificial intelligence-driven gestational diabetes mellitus prediction using clinical and laboratory markers,” Cogent Eng., vol. 11, no. 1, p., 2024, doi: 10.1080/23311916.2024.2330266.
[22] I. Shaheen, N. Javaid, N. Alrajeh, Y. Asim, and S. M. A. Akber, “New AI explained and validated deep learning approaches to accurately predict diabetes,” Med. Biol. Eng. Comput., 2025, doi: 10.1007/s11517-025-03338-6.
[23] S. Biswas, R. Mostafiz, M. S. Uddin, and B. K. Paul, “XAI-FusionNet: Diabetic foot ulcer detection based on multi-scale feature fusion with explainable artificial intelligence,” Heliyon, vol. 10, no. 10, p. e31228, 2024, doi: 10.1016/j.heliyon.2024.e31228.
[24] H. K. Vasireddi, K. S. Devi, and G. N. V. R. Reddy, “DR-XAI: Explainable Deep Learning Model for Accurate Diabetic Retinopathy Severity Assessment,” Arab. J. Sci. Eng., vol. 49, no. 9, pp. 12899–12917, 2024, doi:10.1007/s13369-024-08836-7.
[25] Y. Du, A. R. Rafferty, F. M. McAuliffe, L. Wei, and C. Mooney, “An explainable machine learning-based clinical decision support system for prediction of gestational diabetes mellitus,” Sci. Rep., vol. 12, no. 1, pp. 1–14, 2022, doi:10.1038/s41598-022-05112-2.
[26] M. Azad, M. F. K. Khan, and S. A. El-Ghany, “XAI-Enhanced Machine Learning for Obesity Risk Classification: A Stacking Approach with LIME Explanations,” IEEE Access, vol. 13, no. December 2024, pp. 13847–13865, 2025, doi: 10.1109/ACCESS.2025.3530840.
[27] M. M. Hasan, “Understanding Model Predictions: A Comparative Analysis of SHAP and LIME on Various ML Algorithms,” J. Sci. Technol. Res., vol. 5, no. 1, pp. 17–26, 2024, doi: 10.59738/jstr.v5i1.23(17–26).eaqr5800.
[28] D. M. Pratiwi and L. Mufidah, "Comparison of Decision Tree Classifier and XGBoost Classifier Methods in Predicting Heart Disease," pp. 991–1000, 2024.