Explaining XGBoost predictions with SHAP value: a comprehensive guide to interpreting decision tree-based models

    Serap Ergün   Affiliation


Understanding the factors that affect Key Performance Indicators (KPIs) and how they affect them is frequently important in sectors where data and data science are crucial. Machine learning is utilized to model and predict pertinent KPIs in order to do this. Interpretability is important, nevertheless, in order to fully comprehend how the model generates its predictions. It enables users to pinpoint which traits have aided the model’s ability to learn and comprehend the data. A practical approach for evaluating the contribution of input attributes to model learning has evolved in the form of SHAP (SHapley Additive exPlanations offer an index for evaluating the influence of each feature on the forecasts made by the model. In this paper, it is demonstrated that the contribution of features to model learning may be precisely estimated when utilizing SHAP values with decision tree-based models, which are frequently used to represent tabular data.

Keyword : SHAP value, machine learning, decision tree-based model, feature importance

How to Cite
Ergün, S. (2023). Explaining XGBoost predictions with SHAP value: a comprehensive guide to interpreting decision tree-based models. New Trends in Computer Sciences, 1(1), 19–31.
Published in Issue
Apr 11, 2023
Abstract Views
PDF Downloads
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.


Alparslan Gök, S. Z., Branzei, R., & Tijs, S. (2010). The interval Shapley value: an axiomatization. Central European Journal of Operations Research, 18(2), 131–140.

Arboleda-Florez, M., & Castro-Zuluaga, C. (2023). Interpreting direct sales’ demand forecasts using SHAP values. Production, 33.

Awotunde, J. B., Folorunso, S. O., Imoize, A. L., Odunuga, J. O., Lee, C. C., Li, C. T., & Do, D. T. (2023). An ensemble tree-based model for intrusion detection in industrial internet of things networks. Applied Sciences, 13(4), 2479.

Bowen, D., & Ungar, L. (2020). Generalized SHAP: Generating multiple types of explanations in machine learning. arXiv.

Chalkiadakis, G., Elkind, E., & Wooldridge, M. (2011). Computational aspects of cooperative game theory. Synthesis Lectures on Artificial Intelligence and Machine Learning, 5(6), 1–168.

Chen, H., Lundberg, S., & Lee, S. I. (2021). Explaining models by propagating Shapley values of local components. In Explainable AI in healthcare and medicine (pp. 261–270). Springer, Cham.

Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794).

Covert, I., & Lee, S. I. (2021, March). Improving KernelSHAP: Practical Shapley value estimation using linear regression. In International Conference on Artificial Intelligence and Statistics (pp. 3457–3465). PMLR.

Dargaud, L., Ibsen, M., Tapia, J., & Busch, C. (2023). A principal component analysis-based approach for single morphing attack detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 683–692).

Fagrou, F. Z., Toumi, H., Lahmar, E. H. B., Achtaich, K., El Filali, S., & Baddi, Y. (2022). Connected devices classification using feature selection with machine learning. IAENG International Journal of Computer Science, 49(2).

Fayaz, M., Khan, A., Bilal, M., & Khan, S. U. (2022). Machine learning for fake news classification with optimal feature selection. Soft Computing, 26(16), 7763–7771.

Futagami, K., Fukazawa, Y., Kapoor, N., & Kito, T. (2021). Pairwise acquisition prediction with SHAP value interpretation. The Journal of Finance and Data Science, 7, 22–44.

Gebreyesus, Y., Dalton, D., Nixon, S., De Chiara, D., & Chinnici, M. (2023). Machine learning for data center optimizations: Feature selection using Shapley additive explanation (SHAP). Future Internet, 15(3), 88.

Jain, S., & Saha, A. (2022). Rank-based univariate feature selection methods on machine learning classifiers for code smell detection. Evolutionary Intelligence, 15(1), 609–638.

Jas, K., & Dodagoudar, G. R. (2023). Explainable machine learning model for liquefaction potential assessment of soils using XGBoost-SHAP. Soil Dynamics and Earthquake Engineering, 165, 107662.

Kilincer, I. F., Ertam, F., Sengur, A., Tan, R. S., & Acharya, U. R. (2023). Automated detection of cybersecurity attacks in healthcare systems with recursive feature elimination and multilayer perceptron optimization. Biocybernetics and Biomedical Engineering, 43(1), 30–41.

Kim, D., Handayani, M. P., Lee, S., & Lee, J. (2023). Feature attribution analysis to quantify the impact of oceanographic and maneuverability factors on vessel shaft power using explainable tree-based model. Sensors, 23(3), 1072.

Kumari, S., Singh, K., Khan, T., Ariffin, M. M., Mohan, S. K., Baleanu, D., & Ahmadian, A. (2023). A novel approach for continuous authentication of mobile users using Reduce Feature Elimination (RFE): A machine learning approach. Mobile Networks and Applications.

Lee, M., Lee, J. H., & Kim, D. H. (2022). Gender recognition using optimal gait feature based on recursive feature elimination in normal walking. Expert Systems with Applications, 189, 116040.

Li, L., Qiao, J., Yu, G., Wang, L., Li, H. Y., Liao, C., & Zhu, Z. (2022). Interpretable tree-based ensemble model for predicting beach water quality. Water Research, 211, 118078.

Li, Z. (2022). Extracting spatial effects from machine learning model using local interpretation method: An example of SHAP and XGBoost. Computers, Environment and Urban Systems, 96, 101845.

Liu, J., Kang, H., Tao, W., Li, H., He, D., Ma, L., Tang, H., Wu, S., Yang, K., & Li, X. (2023). A spatial distribution–Principal component analysis (SD-PCA) model to assess pollution of heavy metals in soil. Science of The Total Environment, 859, 160112.

Liu, X., & Aldrich, C. (2023). Explaining anomalies in coal proximity and coal processing data with Shapley and tree-based models. Fuel, 335, 126891.

Loecher, M. (2022, August 23–26). Debiasing MDI feature importance and SHAP values in tree ensembles. In A. Holzinger, P. Kieseberg, A. M. Tjoa, & E. Weippl (Eds.), Lecture notes in computer science: Vol. 13480. Machine learning and knowledge extraction (pp. 114–129). Springer International Publishing.

Mangalathu, S., Hwang, S. H., & Jeon, J. S. (2020). Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Engineering Structures, 219, 110927.

Merrick, L., & Taly, A. (2020, August). The explanation game: Explaining machine learning models using shapley values. In International Cross-Domain Conference for Machine Learning and Knowledge Extraction (pp. 17–38). Springer, Cham.

Mitchell, R., Frank, E., & Holmes, G. (2022). GPUTreeShap: Massively parallel exact calculation of SHAP scores for tree ensembles. PeerJ Computer Science, 8, e880.

Rozemberczki, B., Watson, L., Bayer, P., Yang, H. T., Kiss, O., Nilsson, S., & Sarkar, R. (2022). The Shapley value in machine learning. arXiv.

Serrão, R. G., Oliveira, M. R., & Oliveira, L. (2023). Theoretical derivation of interval principal component analysis. Information Sciences, 621, 227–247.

Ullah, I., Liu, K., Yamamoto, T., Zahid, M., & Jamal, A. (2023). Modeling of machine learning with SHAP approach for electric vehicle charging station choice behavior prediction. Travel Behaviour and Society, 31, 78–92.

Wang, D., Thunéll, S., Lindberg, U., Jiang, L., Trygg, J., & Tysklind, M. (2022). Towards better process management in wastewater treatment plants: Process analytics based on SHAP values for tree-based machine learning methods. Journal of Environmental Management, 301, 113941.