Multi-purpose password dataset generation and its application in decision making for password cracking through machine learning


This article proposes a method for multi-purpose password dataset generation suitable for use in further machine learning and other research related, directly or indirectly, to passwords. Currently, password datasets are not suitable for machine learning or decision-driven password cracking. Most password datasets are just any old password dictionaries that contain only leaked and common passwords and no other information. Other password datasets are small and include only weak passwords that have previously been leaked. The literature is rich in terms of methods used for password cracking based on password datasets. Those methods are mainly focused on generating more password candidates like the ones included in the training dataset. The proposed method exploits statistical analysis of leaked passwords and randomness to ensure diversity in the dataset. An experiment with the generated dataset has shown significant improvement in time when performing dictionary attack but not when performing brute-force attack.

Keyword : passwords, password cracking, password dataset, password strength, machine learning

How to Cite
Vainer, M. (2023). Multi-purpose password dataset generation and its application in decision making for password cracking through machine learning. New Trends in Computer Sciences, 1(1), 1–18.
Published in Issue
Apr 11, 2023
Abstract Views
PDF Downloads
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.


Bachmann, M. (2014). Passwords are dead: Alternative authentication methods. In 2014 IEEE Joint Intelligence and Security Informatics Conference (pp. 322–322). IEEE.

Bansal, B. (2019). Password strength classifier dataset. Kaggle.

Bansal, S. (2021). 10000 most common passwords. Kaggle.

Bowes, R. (2008). Passwords – SkullSecurity.

Craenen, R. (n.d.). Leet speak cheat sheet. Retrieved August 21, 2022, from

Deng, G., Yu, X., & Guo, H. (2019). Efficient password guessing based on a password segmentation approach. In 2019 IEEE Global Communications Conference (GLOBECOM) (pp. 1–6). IEEE.

Devi, K. K., & Arumugam, S. (2019). Password cracking algorithm using probabilistic conjunctive grammar. In 2019 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS) (pp. 1–4). IEEE.

Grassi, P., Garcia, M., & Fenton, J. (2017). Digital identity guidelines: Revision 3. National Institute of Standards and Technology.

Hellman, M. E. (1980). A cryptanalytic time – memory trade-off. IEEE Transactions on Information Theory, 26(4), 401–406.

Hitaj, B., Gasti, P., Ateniese, G., & Perez-Cruz, F. (2017). PassGAN: A deep learning approach for password guessing. aXiv.

Kaspersky. (n.d.). Brute force attack: Definition and examples. Retrieved July 19, 2022, from

Kim, P., Lee, Y., Hong, Y.-S., & Kwon, T. (2021). A password meter without password exposure. Sensors, 21(2), 345.

Li, Z., Li, T., & Zhu, F. (2019). An online password guessing method based on big data. In Proceedings of the 2019 3rd International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence (pp. 59–62).

McMillan, R. (2012). The world’s first computer password? It was useless too.

NordPass. (2021). Top 200 most common password list 2021.

Oechslin, P. (2003). Making a faster cryptanalytic time-memory trade-off. In D. Boneh (Ed.), Lecture notes in computer science: Vol. 2729. Advances in Cryptology – CRYPTO 2003. Springer, Berlin, Heidelberg.

Pleacher, D. (n.d.). Calculating password entropy. Retrieved February 16, 2023, from

Potter, B. (2005). Are passwords dead? Network Security, 2005(9), 7–8.

scikit-learn. (n.d.). Metrics and scoring: quantifying the quality of predictions – scikit-learn 1.2.1 documentation. Retrieved February 16, 2023, from

Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423.

Sipser, M. (2012). Introduction to the theory of computation. Cengage Learning.

Szczepanek, A. (2021). Password entropy calculator.

Tatli, E. I. (2015). Cracking more password hashes with patterns. IEEE Transactions on Information Forensics and Security, 10(8), 1656–1665.

Ur, B., Noma, F., Bees, J., Segreti, S. M., Shay, R., Bauer, L., Christin, N., & Cranor, L. F. (2015). “I Added ‘!’ at the End to Make It Secure”: Observing password creation in the lab. In SOUPS 2015 proceedings. USENIX.

Weir, M., Aggarwal, S., Medeiros, B. de, & Glodek, B. (2009). Password cracking using probabilistic context-free grammars. In 2009 30th IEEE Symposium on Security and Privacy (pp. 391–405). IEEE.

Yu, F., & Huang, Y. (2015). An overview of study of passowrd cracking. In 2015 International Conference on Computer Science and Mechanical Automation (CSMA) (pp. 25–29). IEEE.