Evaluating the performance of machine learning approaches in predicting Albanian Shkumbini River's waters using water quality index model

    Lule Basha Affiliation
    ; Bederiana Shyti Affiliation
    ; Lirim Bekteshi Affiliation


A common technique for assessing the overall water quality state of surface water and groundwater systems globally is the water quality index (WQI) method. The aim of the research is to use four machine learning classifier algorithms: Gradient boosting, Naive Bayes, Random Forest, and K-Nearest Neighbour to determine which model was most effective at forecasting the various water quality index and classes of the Albanian Shkumbini River. The analysis was performed on the data collected during a 4-year period, in six monitoring points, for nine parameters.
The predictive accuracy of the models, XGBoost, Random Forest, K-Nearest Neighbour, and Naive Bayes, was determined to be 98.61%, 94.44%, 91.22%, and 94.45%, respectively. Notably, the XGBoost algorithm demonstrated superior performance in terms of F1 score, sensitivity, and prediction accuracy, the lowest errors during both learning (RMSE = 2.1, MSE = 9.8, MAE = 1.13) and evaluating (RMSE = 0.0, MSE = 0.01, MAE = 0.01) stages. The findings highlighted that Biochemical oxygen demand (BOD), Bicarbonate (HCO3), and Total Phosphor had the most positive impact on the Shkumbini River’s water quality. Additionally, a statistically significant, strong positive correlation (r = 0.85) was identified between BOD and WQI, emphasizing its crucial role in influencing water quality in the Shkumbini River.

Keyword : Water Quality Index model, Shkumbini River, machine learning classifier, model accuracy

How to Cite
Basha, L., Shyti, B., & Bekteshi, L. (2024). Evaluating the performance of machine learning approaches in predicting Albanian Shkumbini River’s waters using water quality index model. Journal of Environmental Engineering and Landscape Management, 32(2), 117–127.
Published in Issue
Mar 6, 2024
Abstract Views
PDF Downloads
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.


Abbasi, T., & Abbasi, S. A. (2012). Water-quality indices: Looking back, looking ahead. In Water quality indices (pp. 353–356). Elsevier.

Aldhyani, T. H. H., Al-Yaari, M., Alkahtani H., & Maashi, M. (2020). Retraction: Water quality prediction using artificial intelligence algorithms. Applied Bionics and Biomechanics, 2020, Ar­ticle 6659314.

Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–85.

Azrour, M., Mabrouki, J., Fattah, G., Guezzaz A., & Aziz, F. (2021). Machine learning algorithms for efficient water quality prediction. Modeleling Earth Systems and Environment, 8, 2793–2801.

Bedi, S., Samal, A., Ray, C., & Snow, D. (2020). Comparative evaluation of machine learning models for groundwater quality assessment. Environmental Monitoring and Assessment, 192, Article 776.

Brown, R. M., Mccleiland, N. J., Deiniger R. A., & O’Connor, M. F. (1972, June 18–23). Water quality index-crossing the physical barrier. In Proceedings of the International Conference on Water Pollution Research (pp. 787–797), Jerusalem.

Chen, T., & Guestrin, C. (2016, August 13–17). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794), San Francisco.

Cunningham, P., & Delany, S. J. (2007). k-Nearest neighbour classifiers. ACM Computing Surveys, 54(6), 1–25.

Dadolahi-Sohrab, A., Arjomand, F., & Fadaei-Nasab, M. (2012). Water quality index as a simple indicator of watersheds pollution in southwestern part of Iran. Water and Environment Journal, 26(4), 445–454.

Damo, R., & Icka, P. (2013). Evaluation of water quality index for drinking water. Polish Journal of Environmental Studies, 22(4), 1045–1051.

El Bilali, A., Taleb, A., & Brouziyne, Y. (2021). Groundwater quality forecasting using machine learning algorithms for irrigation purposes. Agricultural Water Management, 245, Article 106625.

Ferreira, A. J., & Figueiredo, M. A. (2012). Boosting algorithms: A review of methods, theory, and applications. In Ensemble machine learning (pp. 35–85). Springer.

Georgescu, P.-L., Moldovanu, S., Iticescu, C., Calmuc, M., Calmuc, V., Topa, C., & Moraru, L. (2023). Assessing and forecasting water quality in the Danube River by using neural network approaches. The Science of the Total Environment, 879, Article 162998.

Horton, R. K. (1965). An index number system for rating water quality. Journal of the Water Pollution Control Federation, 37(3), 303–306.

International Organization for Standardization. (2018). Water quality – Sampling – Part 4: Guidance on sampling from lakes, natural and man-made (ISO Standard No. 5667-4).

International Organization for Standardization. (2015). Water quality – Sampling – Part 6: Guidance on sampling of rivers and streams (ISO 5667-6).

Khoi, D. N., Quan, N. T., Linh, D. Q., Nhi, P. T. T., & Thuy, N. T. D. (2022). Using machine learning models for predicting the water quality index in the La Buong River, Vietnam. Water, 14(10), Article 1552.

Naloufi, M., Lucas F. S., Souihi, S., Servais, P., Janne, A., & Wanderley Matos De Abreu, T. (2021). Evaluating the performance of machine learning approaches to predict the microbial quality of surface waters and to optimize the sampling effort. Water, 13(18), Article 2457.

Nayan, A.-A., Kibria, M. G., Rahman, M. O., & Saha, J. (2020, November 28–29). River water quality analysis and prediction using GBM. In Proceedings of the 2020 2nd International Conference on Advanced Information and Communication Technology (ICAICT) (pp. 219–224). IEEE.

Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D., Frame, J. M., Prieto, C., Gupta, H. V. (2021). What role does hydro­logical science play in the age of machine learning? Water Resources Research, 57(3), Article e2020WR028091.

Parween, S., Siddique, N. A., Mahammad Diganta, M. T., Olbert, A. I., & Uddin, Md. G. (2022). Assessment of urban river water quality using modified NSF water quality index model at Siliguri city, West Bengal, India. Environmental and Sustainability Indicators, 16, Article 100202.

Rahman, A. (2020). Statistics for data science and policy analysis. Springer.

Ravindra, B., Subba Rao, N., & Dhanamjaya Rao, E. N. (2023). Groundwater quality monitoring for assessment of pollution levels and potability using WPI and WQI methods from a part of Guntur district, Andhra Pradesh, India. Environment, Development and Sustainability, 25, 14785–14815.

Roba, C., Rosu, C., Pistea, I., Baciu, C., Costin, D., & Ozunu, A. (2016). Transfer of heavy metals from soil to vegetables in a mining/smelting influenced area (Baia Mare – Ferneziu, Romania). Journal of Environmental Protection and Ecology, 16, 891–898.

Sain, S. R. (1996). The nature of statistical learning theory. Technometrics, 38(4), 409.

Shafi, U., Mumtaz, R., Anwar, H., Qamar, A. M., & Khurshid, H. (2018, October 8–10). Surface water pollution detection using internet of things. In Proceedings 15th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT (HONET-ICT) (pp. 92–96). IEEE.

Shamsuddin, I. I. S., Othman, Z., & Sani, N. S. (2022). Water quality index classification based on machine learning: A case from the Langat River Basin model. Water, 14(19), Article 2939.

Steinhart, C. E., Schierow, L. J., & Sonzogni, W. C. (1982). An environmental quality index for the great lakes. Journal of the American Water Resources Association, 18(6), 1025–1031.

Subba Rao, N., Sunitha, B., Das, R., & Anil Kumar, B. (2022). Monitoring the causes of pollution using groundwater quality and chemistry before and after the monsoon. Physics and Chemistry of the Earth, 128, Article 103228.

Sulce, S., Rroco, E., Malltezi, J., Shallari, S., Libohova, Z., Sinaj, S., & Qafoku, N. P. (2018). Water quality in Albania: An overview of sources of contamination and controlling factors. Albanian Journal of Agricultural Sciences, 2 (Special edition – Proceedings of ICOALS), 279–297.

Sutadian, A. D., Muttil, N., Yilmaz, A. G., & Perera, B. J. C. (2018). Development of a water quality index for rivers in West Java Province, Indonesia. Ecological Indicators, 85, 966–982.

Uddin, M. G., Nash, S., & Olbert, A. I. (2021). A review of water quality index models and their use for assessing surface water quality. Ecological Indicators, 122, Article 107218.

Uddin, M. G., Nash, S., Rahman, A., & Olbert, A. I. (2022a). A comprehensive method for improvement of water quality index (WQI) models for coastal water quality assessment. Water Research, 219, Article 118532.

Uddin, M. G., Nash, S., Mahammad Diganta, M. T., Rahman, A., & Olbert, A. I. (2022b). Robust machine learning algorithms for predicting coastal water quality index. Journal or Environmental Management, 321, Article 115923.

Uddin, G., Nash, S., & Olbert, A. I. (2022c). Optimization of parameters in a water quality index model using principal component analysis [Conference presentation]. Proceedings of the 39th IAHR World Congress, Granada, Spain.

Uddin, M. G., Nash, S., Rahman, A., & Olbert, A. I. (2023a). A novel approach for estimating and predicting uncertainty in water quality index model using machine learning approaches. Water Research, 229, Article 119422.

Uddin, M. G., Nash, S., Rahman, A., & Olbert, A. I. (2023b). Performance analysis of the water quality index model for predicting water state using machine learning techniques. Process Safety and Environmental Protection, 169, 808–828.

Verma, R. K., Murthy, S., Tiwary, R. K., & Verma, S. (2019). Development of simplified WQIs for assessment of spatial and temporal variations of surface water quality in upper Damodar river basin, eastern India. Applied Water Science, 9, Article 21.

World Health Organization. (2017). Guideline for drinking water quality (4th ed., incorporating the 1st addendum).

Zela, G., Demiraj, E., Marko, O., Gjipalaj, J., Erebara, A., Malltezi, J., Zela, E., & Bani, A. (2020). Assessment of the water quality index in the Semani River in Albania. Journal of Environmental Protection, 11(11), 998–1013.