Framework for deep reinforcement learning in Webots virtual environments

DOI: https://doi.org/10.3846/ntcs.2025.23668

Abstract

Reinforcement learning (RL) algorithms, particularly deep reinforcement learning (DRL), have shown transformative potential in robotics by enabling adaptive behaviour in virtual environments. However, a comprehensive framework for efficiently testing, training, and deploying robots in these environments remains underexplored. This study introduces a standardized, open-source framework designed specifically for the Webots simulation environment. Supported by a robust methodology, the framework integrates innovative design patterns and the digital twin (DT) concept with three distinct design patterns for structuring agent-environment interaction, notably including a novel pattern aimed at improving sim-toreal transferability, to enhance RL workflows. The proposed framework is validated through experimental studies on both a model the inverted pendulum and a production-grade Pioneer 3-AT robotic platform. The experiments highlight the framework’s ability to bridge the gap between virtual training and real-world implementation. All resources, including the framework, methodology, and experimental configurations, are openly accessible on GitHub.

Keywords:

reinforcement learning, virtual environments, robot testing, Webots, Stable-Baselines3, simulation tools, digital twin, robotic training

How to Cite

Šareiko, A., Mažeika, D., & Laukaitis, A. (2025). Framework for deep reinforcement learning in Webots virtual environments. New Trends in Computer Sciences, 3(1), 49–63. https://doi.org/10.3846/ntcs.2025.23668

Share

Published in Issue
June 30, 2025
Abstract Views
2

References

Ayala, A., Cruz, F., Campos, D., Rubio, R., Fernandes, B., & Dazeley, R. (2020). A comparison of humanoid robot simulators: A quantitative approach. In Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) (pp. 1–6). IEEE. https://doi.org/10.1109/ICDL-EpiRob48136.2020.9278116

Barto, A., Sutton, R., & Anderson, C. (1970). Neuron like elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man and Cybernetics, 13, 834–846. https://doi.org/10.1109/TSMC.1983.6313077

Behrens, T., Hindriks, K. V., Bordini, R. H., Lars Braubach, Mehdi Dastani, Dix, J., Hübner, J. F., & Pokahr, A. (2012). An interface for agent-environment interaction. Lecture Notes in Computer Science, 6599, 139–158. https://doi.org/10.1007/978-3-642-28939-2_8

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI gym. arXiv. https://arxiv.org/abs/1606.01540

Ferigo, D., Traversaro, S., Metta, G., & Pucci, D. (2020, January). Gym-ignition: Reproducible robotic simulations for reinforcement learning. In 2020 IEEE/SICE International Symposium on System Integration (SII) (pp. 885–890). IEEE. https://doi.org/10.1109/SII46433.2020.9025951

Garbev, A., & Atanassov, A. (2020, October). Comparative analysis of RoboDK and robot operating system for solving diagnostics tasks in off-line programming. In 2020 International Conference Automatics and Informatics (ICAI) (pp. 1–5). IEEE. https://doi.org/10.1109/ICAI50593.2020.9311332

Jonban, M. S., Romeral, L., Marzband, M., & Abusorrah, A. (2024). A reinforcement learning approach using Markov decision processes for battery energy storage control within a smart contract framework. Journal of Energy Storage, 86, Article 111342. https://doi.org/10.1016/j.est.2024.111342

Kilinc, O., & Montana, G. (2022). Reinforcement learning for robotic manipulation using simulated locomotion demonstrations. Machine Learning, 111(2), 465–486. https://doi.org/10.1007/s10994-021-06116-1

Kirtas, M., Tsampazis, K., Passalis, N., & Tefas, A. (2020). Deepbots: A webots-based deep reinforcement learning framework for robotics. In I. Maglogiannis, L. Iliadis, & E. Pimenidis (Eds.), IFIP Advances in Information and Communication Technology: Vol. 584. Artificial Intelligence Applications and Innovations. AIAI 2020 (pp. 64–75). Springer International Publishing. https://doi.org/10.1007/978-3-030-49186-4_6

Ladosz, P., Weng, L., Kim, M., & Oh, H. (2022). Exploration in deep reinforcement learning: A survey. Information Fusion, 85, 1–22. https://doi.org/10.1016/j.inffus.2022.03.003

Liang, E., Liaw, R., Moritz, P., Nishihara, R., Fox, R., Goldberg, K., Gonzalez, J. E., Jordan, M., I., & Stoica, I. (2017). RLLIB: Abstractions for Distributed Reinforcement Learning. arXiv. https://arxiv.org/abs/1712.09381

Lopez, N. G., Nuin, Y. L. E., Moral, E. B., Juan, L. U. S., Rueda, A. S., Vilches, V. M., & Kojcev, R. (2019). gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo. arXiv. https://arxiv.org/abs/1903.06278

Michel, O. (2004). Cyberbotics ltd. webots™: Professional mobile robot simulation. International Journal of Advanced Robotic Systems, 1(1), Article5. https://doi.org/10.5772/5618

NVIDIA. (2025). NVIDIA Isaac ROS. https://developer.nvidia.com/isaac/ros

Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268), 1–8. http://www.jmlr.org/papers/v22/20-1364.html

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv. https://arxiv.org/abs/1707.06347

Talaat, F. M. (2022). Effective deep Q-networks (EDQN) strategy for resource allocation based on optimized reinforcement learning algorithm. Multimedia Tools and Applications, 81(28), 39945–39961. https://doi.org/10.1007/s11042-022-13000-0

Tang, C., Abbatematteo, B., Hu, J., Chandra, R., Martín-Martín, R., & Stone, P. (2024). Deep reinforcement learning for robotics: A survey of real-world successes. Annual Review of Control, Robotics, and Autonomous Systems, 8, 153–188. https://doi.org/10.1146/annurev-control-030323-022510

Todorov, E., Erez, T., & Tassa, Y. (2012, October). Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 5026–5033). IEEE. https://doi.org/10.1109/IROS.2012.6386109

Tseeva, F. M., Shogenova, M. M., Senov, K. M., Liana, K. V., & Bozieva, A. M. (2024). Comparative analysis of two simulation environments for robots, Gazebo, and CoppeliaSim in the context of their use for teaching students a course in robotic systems modeling. In 2024 International Conference “Quality Management, Transport and Information Security, Information Technologies”(QM&TIS&IT) (pp. 186–189). IEEE. https://doi.org/10.1109/QMTISIT63393.2024.10762908

Uslu, E., Cakmak, F., Altuntaş, N., Marangoz, S., Amasyalı, M. F., & Yavuz, S. (2017). An architecture for multi-robot localization and mapping in the Gazebo/Robot Operating System simulation environment. Simulation, 93(9), 771–780. https://doi.org/10.1177/0037549717710098

Zamora, I., Lopez, N. G., Vilches, V. M., & Cordero, A. H. (2016). Extending the openai gym for robotics: A toolkit for reinforcement learning using ROS and Gazebo. arXiv. https://arxiv.org/abs/1608.05742

Zhang, L., Shen, L., Yang, L., Chen, S., Yuan, B., Wang, X., & Tao, D. (2022). Penalized proximal policy optimization for safe reinforcement learning. arXiv. https://arxiv.org/abs/2205.11814

Towers, M., Kwiatkowski, A., Terry, J., Balis, J. U., Cola, D., Deleu, T., Goulão, M., Kallinteris, A., Krimmel, M., KG, A., Perez-Vicente, R., Pierré, A., Schulhoff, S., Tai, J. J., Tan, H., & Younis, O. G. (2024). Gymnasium: A Standard Interface for Reinforcement Learning Environments. ArXiv. https://arxiv.org/abs/2407.17032

View article in other formats

CrossMark check

CrossMark logo

Published

2025-06-30

Issue

Section

Articles

How to Cite

Šareiko, A., Mažeika, D., & Laukaitis, A. (2025). Framework for deep reinforcement learning in Webots virtual environments. New Trends in Computer Sciences, 3(1), 49–63. https://doi.org/10.3846/ntcs.2025.23668

Share