Modeling the effect of pollutant gas on PM2.5 in China with computational intelligence
DOI: https://doi.org/10.3846/jeelm.2026.25791Abstract
This study employs computational intelligence techniques – gene expression programming (GEP), back-propagation neural network (BPNN), support vector regression (SVR) and linear regression (LR)–to model the quantitative relationship between pollutant gases (PGs) and PM2.5 concentrations using 2021 environmental data from 12 Chinese cities. A comparative analysis was conducted to evaluate model performance using the correlation coefficient (R), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE). Results showed that the correlation coefficients (R) between predicted and actual PM2.5 concentrations ranged from –0.7579 to 0.9802 across all models. SVR and LR demonstrated the most robust performance, achieving high average R values of 0.8656 and 0.8671, respectively. LR also yielded the lowest average RMSE (0.12) and MAE (0.06) across the cities. GEP proved capable of finding highly accurate explicit models, achieving a maximum R of 0.9766. A key finding from the LR models is that CO and PM10 consistently had the most significant impact on PM2.5 concentrations. Correlation formulas derived from GEP and LR can support further PM2.5 analysis. These findings offer insights into PM2.5 formation mechanisms and inform pollution control strategies.
Keywords:
PM2.5, pollutant gas, GEP, BP neural network, SVR, LRHow to Cite
Share
License
Copyright (c) 2026 The Author(s). Published by Vilnius Gediminas Technical University.

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
Arabloo, M., Bahadori, A., Ghiasi, M. M., Lee, M., Abbas, A., & Zendehboudi, S. (2015). A novel modeling approach to optimize oxygen–steam ratios in coal gasification process. Fuel, 153, 1–5. https://doi.org/10.1016/j.fuel.2015.02.083
Azamathulla, H. M. (2012). Gene-expression programming to predict scour at a bridge abutment. Journal of Hydroinformatics, 14(2), 324–331. https://doi.org/10.2166/hydro.2011.135
Bai, Y., & Li, C. (2016). Daily natural gas consumption forecasting based on a structure-calibrated support vector regression approach. Energy Build, 127, 571–579. https://doi.org/10.1016/j.enbuild.2016.06.020
Cheng, A., Jiang, X., Li, Y., Zhang, C., & Zhu, H. (2017). Multiple sources and multiple measures based traffic flow prediction using the chaos theory and support vector regression method. Physica A: Statistical Mechanics and its Applications, 466, 422–434. https://doi.org/10.1016/j.physa.2016.09.041
Chen, T. Y., Chen, S. C., Wang, C. W., Tu, H. P., Chen, P. S., Hu, S. C. S., Li, C. H., Wu, D. W., Hung, C. H., & Kuo, C. H. (2023). The impact of the synergistic effect of SO2 and PM2.5/PM10 on obstructive lung disease in subtropical Taiwan. Front Public Health, 11, Article 1229820. https://doi.org/10.3389/fpubh.2023.1229820
Dondi, A., Carbone, C., Manieri, E., Zama, D., Del Bono, C., Betti, L., Biagi, C., & Lanari, M. (2023). Outdoor air pollution and childhood respiratory disease: The role of oxidative stress. International Journal of Molecular Sciences, 24(5), Article 4345. https://doi.org/10.3390/ijms24054345
Dorofeyev, A., Dorofeyeva, A., Borysov, A., Tolstanova, G., & Borisova, T. (2023). Gastrointestinal health: Changes of intestinal mucosa and microbiota in patients with ulcerative colitis and irritable bowel syndrome from PM2.5-polluted regions of Ukraine. Environmental Science and Pollution Research, 30(3), 7312–7324. https://doi.org/10.1007/s11356-022-22710-9
Drewil, G. I., & Al-Bahadili, R. J. (2022). Air pollution prediction using LSTM deep learning and metaheuristics algorithms. Measurement: Sensors, 24, Article 100546. https://doi.org/10.1016/j.measen.2022.100546
Frank, A., Fabregat-Traver, D., & Bientinesi, P. (2016). Large-scale linear regression: Development of high-performance routines. Applied Mathematics and Computation, 275, 411–421. https://doi.org/10.1016/j.amc.2015.11.078
He, Y., Liu, R., Li, H., Wang, S., & Lu, X (2017). Short-term power load probability density forecasting method using kernel-based support vector quantile regression and Copula theory. Applied Energy, 185, 254–266. https://doi.org/10.1016/j.apenergy.2016.10.079
Khan, M., Nassar, R. U. D., Anwar, W., Rasheed, M., Najeh, T., Gamil, Y., & Farooq, F. (2024). Forecasting the strength of graphene nanoparticles-reinforced cementitious composites using ensemble learning algorithms. Results Engineering, 21, Article 101837. https://doi.org/10.1016/j.rineng.2024.101837
Kicsiny, R. (2016). Improved multiple linear regression based models for solar collectors. Renewable Energy, 91, 224–232. https://doi.org/10.1016/j.renene.2016.01.056
Kokkinos, K., Karayannis, V., Nathanail, E., & Moustakas, K. (2021). A comparative analysis of Statistical and Computational Intelligence methodologies for the prediction of traffic-induced fine particulate matter and NO2. Journal of Cleaner Production, 328, Article 129500. https://doi.org/10.1016/j.jclepro.2021.129500
Kumar, S., Mishra, S., & Singh, S. K. (2020). A machine learning-based model to estimate PM2.5 concentration levels in Delhi’s atmosphere. Heliyon, 6(11), Article e05618. https://doi.org/10.1016/j.heliyon.2020.e05618
Liu, S., Hou, Z., & Yin, C. (2016). Data-driven modeling for UGI gasification processes via an enhanced genetic BP neural network with link switches. IEEE Transactions on Neural Networks and Learning Systems, 27(12), 2718–2729. https://doi.org/10.1109/TNNLS.2015.2491325
Liu, X. Q., Huang, J., Song, C., Zhang, T. L., Liu, Y. P., & Yu, L. (2023). Neurodevelopmental toxicity induced by PM2.5 exposure and its possible role in neurodegenerative and mental disorders. Human & Experimental Toxicology, 42, 1–20. http://dx.doi.org/10.1177/09603271231191436
López-Granero, C., Polyanskaya, L., Ruiz-Sobremazas, D., Barrasa, A., Aschner, M., & Alique, M. (2023). Particulate matter in human elderly: Higher susceptibility to cognitive decline and age-related diseases. Biomolecules, 14(1), Article 35. https://doi.org/10.3390/biom14010035
Mahdaviara, M., Larestani, A., Nait Amar, M., & Hemmati-Sarapardeh, A. (2022). On the evaluation of permeability of heterogeneous carbonate reservoirs using rigorous data-driven techniques. Journal of Petroleum Science and Engineering, 208, Article 109685. https://doi.org/10.1016/j.petrol.2021.109685
Münzel, T., Hahad, O., Daiber, A., & Lelieveld, J. (2021). Luftverschmutzung und Herz-Kreislauf-Erkrankungen [Air pollution and cardiovascular diseases]. Herz, 46(2), 120–128. https://doi.org/10.1007/s00059-020-05016-9
Onaiwu, G. E., & Eferavware, S. A. (2023). The potential health risk assessment of PM2.5-bound polycyclic aromatic hydrocarbons (PAHs) on the human respiratory system within the ambient air of automobile workshops in Benin City, Nigeria. Air Quality, Atmosphere & Health, 16(12), 2431–2441. https://doi.org/10.1007/s11869-023-01415-z
Peng, X., & Xu, D. (2016). Projection support vector regression algorithms for data regression. Knowledge-Based Systems, 112, 54–66. https://doi.org/10.1016/j.knosys.2016.08.030
Samad, A., Garuda, S., Vogt, U., & Yang, B. (2023). Air pollution prediction using machine learning techniques – An approach to replace existing monitoring stations with virtual monitoring stations. Atmospheric Environment, 310, Article 119987. https://doi.org/10.1016/j.atmosenv.2023.119987
Sarir, P., Chen, J., Asteris, P. G., Armaghani, D. J., & Tahir, M. M. (2021). Developing GEP tree-based, neuro-swarm, and whale optimization models for evaluation of bearing capacity of concrete-filled steel tube columns. Engineering with Computers, 37(1), 1–19. https://doi.org/10.1007/s00366-019-00808-y
Schweidtmann, A. M., Esche, E., Fischer, A., Kloft, M., Repke, J. U., Sager, S., & Mitsos, A. (2021). Machine learning in chemical engineering: A perspective. Chemie Ingenieur Technik, 93(12), 2029–2039. https://doi.org/10.1002/cite.202100083
Tosun, E., Aydin, K., & Bilgili, M. (2016). Comparison of linear regression and artificial neural network model of a diesel engine fueled with biodiesel-alcohol mixtures. Alexandria Engineering Journal, 55(4), 3081–3089. https://doi.org/10.1016/j.aej.2016.08.011
Wang, G., Su, Y., & Shu, L. (2016a). One-day-ahead daily power forecasting of photovoltaic systems based on partial functional linear regression models. Renewable Energy, 96, 469–478. https://doi.org/10.1016/j.renene.2016.04.089
Wang, J., Wang, R. H., Wang, C., & Shen, L. (2016b). Improved v-support vector regression model based on variable selection and brain storm optimization for stock price forecasting. Applied Soft Computing, 49, 164–178. https://doi.org/10.1016/j.asoc.2016.07.024
Wang, Y., Lu, C., & Zuo, C. (2015). Coal mine safety production forewarning based on improved BP neural network. International Journal of Mining Science and Technology, 25(2), 319–324. https://doi.org/10.1016/j.ijmst.2015.02.023
Widziewicz-Rzońca, K., Pyta, H., Słaby, K., Błaszczak, B., Rogula-Kopiec, P., Mathews, B., Błaszczak, M., & Klejnowski, K. (2022). Analysis of the seasonal and fractional variability of metals bearing particles in an urban environment and their inhalability. Journal of Atmospheric Chemistry, 80(1), 77–101. https://doi.org/10.1007/s10874-022-09438-z
Wu, C. H., Lin, I. S., Wei, M. L., & Cheng, T. Y. (2013). Target position estimation by genetic expression programming for mobile robots with vision sensors. IEEE Transactions on Instrumentation and Measurement, 62(12), 3218–3230. https://doi.org/10.1109/TIM.2013.2272173
Xu, T., Zhang, C., Liu, C., & Hu, Q. (2023). Variability of PM2.5 and O3 concentrations and their driving forces over Chinese megacities during 2018-2020. Journal of Environmental Sciences, 124, 1–10. https://doi.org/10.1016/j.jes.2021.10.014
Yassin, M. A., Alazba, A. A., & Mattar, M. A. (2016). A new predictive model for furrow irrigation infiltration using gene expression programming. Computers and Electronics in Agriculture, 122, 168–175. https://doi.org/10.1016/j.compag.2016.01.035
Yuan, X., Liang, F., Zhu, J., Huang, K., Dai, L., Li, X., Wang, Y., Li, Q., Lu, X., Huang, J., Liao, L., Liu, Y., Gu, D., Liu, H., & Liu, F. (2023). Maternal exposure to PM2.5 and the risk of congenital heart defects in 1.4 million births: A nationwide surveillance-based study. Circulation, 147(7), 565–574. https://doi.org/10.1161/CIRCULATIONAHA.122.061245
Yu, F., & Xu, X. (2014). A short-term load forecasting model of natural gas based on optimized genetic algorithm and improved BP neural network. Applied Energy, 134, 102–113. https://doi.org/10.1016/j.apenergy.2014.07.104
Zendehboudi, S., Rezaei, N., & Lohi, A. (2018). Applications of hybrid models in chemical, petroleum, and energy systems: A systematic review. Applied Energy, 228, 2539–2566. https://doi.org/10.1016/j.apenergy.2018.06.051
Zhang, X., Wu, S., Lu, Y., Qi, J., Li, X., Gao, S., Qi, X., & Tan, J. (2024). Association of ambient PM2.5 and its components with in vitro fertilization outcomes: The modifying role of maternal dietary patterns. Ecotoxicology and Environmental Safety, 282, Article 116685. https://doi.org/10.1016/j.ecoenv.2024.116685
Zhou, J., Wan, X., Zhang, J., Yan, Z., & Li, Y. (2015). Modeling of constitutive relationship of aluminum alloy based on BP neural network model. Materials Today: Proceedings, 2(10), 5023–5028. https://doi.org/10.1016/j.matpr.2015.10.09
View article in other formats
Published
Issue
Section
Copyright
Copyright (c) 2026 The Author(s). Published by Vilnius Gediminas Technical University.
License

This work is licensed under a Creative Commons Attribution 4.0 International License.