Ontology development using language model-based Named Entity Recognition for integrated construction information
DOI: https://doi.org/10.3846/jcem.2026.26519Abstract
Named Entity Recognition (NER) is crucial for building knowledge bases and facilitating semantic search in the construction industry. While conventional NER models can identify general entities such as spatial and organizational information, extracting domain-specific entities, like materials and dimensions from construction-related texts –particularly in Bill of Quantities (BoQ) and Building Information Modeling (BIM) parameters – remains challenging extensive manual annotation.
Key entity categories were defined, and datasets from four BoQ and two BIM sources were annotated to establish ground truth labels. A semi-automated labelling process was introduced to streamline annotation and improve training efficiency. Experimental results demonstrate that the proposed framework reduces annotation time by nearly threefold compared to manual processes. This study developed a BERT-based NER model achieving F1 scores ranging from 0.81 to 0.97, with higher performance for well-defined construction parameters (name, material, size, thickness, diameter, length, type: 0.95–0.97) compared to miscellaneous text entities (0.81).
Despite extensive research in construction NLP, existing approaches fail to address the integration challenges between heterogeneous BIM-BoQ data formats and lack domain-specific entity recognition capabilities. The extracted entities are aligned with standardized formats using semantic text similarity techniques. This ontology-based integration enhances data consistency, interoperability, and retrieval accuracy, improving semantic alignment while minimizing discrepancies from heterogeneous terminology.
Keywords:
Large Language Models (LLM), Natural Language Processing (NLP), Named Entity Recognition (NER), deep learning, Building Information Modeling (BIM), construction data integration, ontology development, data standardizationHow to Cite
Share
License
Copyright (c) 2026 The Author(s). Published by Vilnius Gediminas Technical University.

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
Arzideh, K., Schäfer, H., Allende-Cid, H., Baldini, G., Hilser, T., Idrissi-Yaghir, A., Laue, K., Chakraborty, N., Doll, N., Antweiler, D., Klug, K., Beck, N., Giesselbach, S., Friedrich, C. M., Nensa, F., Schuler, M., & Hosch, R. (2025). From BERT to generative AI – Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports. Computers in Biology and Medicine, 195, Article 110665. https://doi.org/10.1016/j.compbiomed.2025.110665
Beetz, J., van Leeuwen, J., & de Vries, B. (2009). IfcOWL: A case of transforming EXPRESS schemas into ontologies. Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 23(1), 89–101. https://doi.org/10.1017/S0890060409000122
Cho, H., & Lee, H. (2019). Biomedical named entity recognition using deep neural networks with contextual information. BMC Bioinformatics, 20, Article 735. https://doi.org/10.1186/s12859-019-3321-4
Gruber, T. (1993). Towards principles for the design of ontologies used for knowledge sharing. In N. Guarino, & R. Poli (Eds.), Formal ontology in conceptual analysis and knowledge representation. Kluwer Academic Publishers.
Halmetoja, E. (2022). The role of digital twins and their application for the built environment. In M. Bolpagni, R. Gavina, & D. Ribeiro (Eds.), Industry 4.0 for the built environment: Vol. 20. Structural integrity (pp. 415–442). Springer, Cham. https://doi.org/10.1007/978-3-030-82430-3_18
Jagannathan, M., Roy, D., & Delhi, V. S. K. (2022). Application of NLP-based topic modeling to analyse unstructured text data in annual reports of construction contracting companies. CSI Transactions on ICT, 10(2), 97–106. https://doi.org/10.1007/s40012-022-00355-w
Jeon, K., Lee, G., Yang, S., & Jeong, H. D. (2022). Named entity recognition of building construction defect information from text with linguistic noise. Automation in Construction, 143, Article 104543. https://doi.org/10.1016/j.autcon.2022.104543
Jeong, D. W., Park, J. H., Seo, J. J., Shin, W. H., & Jang, H. Y. (2024). The progress and prospective advancements of the national digital twin pilot project. Journal of Korean Society for Geospatial Information Science, 32(3), 51–63. https://doi.org/10.7319/kogsis.2024.32.3.051
Keshavarz, H., Vagena, Z., Kouki, P., Fountalis, I., Mabrouki, M., Belaweid, A., & Vasiloglou, N. (2022). Named entity recognition in long documents: An end-to-end case study in the legal domain. In 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan. IEEE. https://doi.org/10.1109/BigData55660.2022.10020873
Kim, S.-Y., Lee, H.-H., Choi, E.-S., & Go, J.-U. (2020). A case study on the construction of 3D geo-spatial information for digital twin implementation. Journal of the Korean Association of Geographic Information Studies, 23(3), 146–160.
Kuiper, I., & Duffield, C. (2018). Describing structural configurations towards identifying and establishing theoretical foundations for the exploration and understanding of building information modelling (BIM). Department of Infrastructure Engineering, The University of Melbourne.
Lê, N. C., Nguyen, N.-Y., & Trinh, A. D. (2019). On the Vietnamese name entity recognition: A Deep Learning Method approach. In 2020 RIVF International Conference on Computing and Communication Technologies (RIVF), Ho Chi Minh City, Vietnam. IEEE. https://doi.org/10.1109/RIVF48685.2020.9140754
Lester, B., Pressel, D., Hemmeter, A., Choudhury, S. R., & Bangalore, S. (2020). Multiple word embeddings for increased diversity of representation. arXiv. https://doi.org/10.48550/arXiv.2009.14394
Li, S., Wang, J., & Xu, Z. (2024). Automated compliance checking for BIM models based on Chinese-NLP and knowledge graph: An integrative conceptual framework. Engineering, Construction and Architectural Management, 32(6), 3832–3856. https://doi.org/10.1108/ECAM-10-2023-1037
Luo, Y., Xiao, F., & Hai, Z. (2019). Hierarchical contextualized representation for named entity recognition. ArXiv. https://doi.org/10.48550/arXiv.1911.02257
Luoma, J., & Pyysalo, S. (2020). Exploring cross-sentence contexts for Named Entity Recognition with BERT. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 904–914), Barcelona, Spain. https://doi.org/10.18653/v1/2020.coling-main.78
Na, Y. G., & Kim, J. Y. (2024). Metadata design for interoperability of digital land information. Journal of the Korean Cadastre Information Association, 26(3), 133–145. https://doi.org/10.46416/JKCIA.2024.12.26.3.133
Pakhale, K. (2023). Comprehensive overview of named entity recognition: Models, domain-specific applications and challenges. arXiv. https://doi.org/10.48550/arXiv.2309.14084
Rasmussen, M. H., Lefrançois, M., Schneider, G. F., & Pauwels, P. (2021). BOT: The building topology ontology of the W3C linked building data group. Semantic Web, 12(1), 143–161. https://doi.org/10.3233/SW-200385
Sammet, J., & Krestel, R. (2023, September). Domain-specific keyword extraction using BERT. In S. Carvalho, A. F. Khan, A. O. Anić, B. Spahiu, J. Gracia, J. P. McCrae, D. Gromann, B. Heinisch, & A. Salgado (Eds.), Proceedings of the 4th Conference on Language, Data and Knowledge (pp. 659–665), Vienna, Austria.
Taher, E., Hoseini, S. A., & Shamsfard, M. (2020). Beheshti-NER: Persian named entity recognition using BERT. arXiv. https://doi.org/10.48550/arXiv.2003.08875
Taillé, B., Guigue, V., Gallinari, P., & Paribas, B. (2019). Une Étude Empirique de la Capacité de Généralisation des Plongements de Mots Contextuels en Extraction d’Entités. In Conférence Nationale d’Intelligence Artificielle Année 2019.
Tang, S., Zhang, C., Hao, J., & Guo, F. (2022). A framework for BIM, BAS, and IoT data exchange using semantic web technologies. In Construction Research Congress 2022. ASCE. https://doi.org/10.1061/9780784483961.098
Wu, S., Shen, Q., Deng, Y., & Cheng, J. (2019). Natural-language-based intelligent retrieval engine for BIM object database. Computers in Industry, 108, 73–88. https://doi.org/10.1016/j.compind.2019.02.016
Wu, C., Wang, X., Wu, P., Wang, J., Jiang, R., Chen, M., & Swapan, M. (2021). Hybrid deep learning model for automating constraint modelling in advanced working packaging. Automation in Construction, 127, Article 103733. https://doi.org/10.1016/j.autcon.2021.103733
Wu, C., Li, X., Guo, Y., Wang, J., Ren, Z., Wang, M., & Yang, Z. (2022a). Natural language processing for smart construction: Current status and future directions. Automation in Construction, 134, Article 104059. https://doi.org/10.1016/j.autcon.2021.104059
Wu, L.-T., Lin, J.-R., Leng, S., Li, J.-L., & Hu, Z.-Z. (2022b). Rule-based information extraction for mechanical-electrical-plumbing-specific semantic web. Automation in Construction, 135, Article 104108. https://doi.org/10.1016/j.autcon.2021.104108
Wu, D., Yang, J., & Wang, K. (2024). Exploring the reversal curse and other deductive logical reasoning in BERT and GPT-based large language models. Patterns, 5(9), Article 101030. https://doi.org/10.1016/j.patter.2024.101030
Xie, S. (2024). Research on Named Entity Recognition Method based on BERT model. In 2024 IEEE 10th International Conference on Big Data Computing Service and Machine Learning Applications (BigDataService) (pp. 92–96), Shanghai, China. IEEE. https://doi.org/10.1109/BigDataService62917.2024.00020
Xu, X., & Cai, H. (2021). Ontology and rule-based natural language processing approach for interpreting textual regulations on underground utility infrastructure. Advanced Engineering Informatics, 48, Article 101288. https://doi.org/10.1016/j.aei.2021.101288
Yang, G., & Xu, H. (2020). A residual BiLSTM model for named entity recognition. IEEE Access, 8, 227710–227718. https://doi.org/10.1109/ACCESS.2020.3046253
Yin, M., Tang, L., Webster, C., Li, J., Li, H., Wu, Z., & Cheng, R. C. (2023). Two-stage Text-to-BIMQL semantic parsing for building information model extraction using graph neural networks. Automation in Construction, 152, Article 104902. https://doi.org/10.1016/j.autcon.2023.104902
Yin, M., Tang, L., Webster, C., Yi, X., Ying, H., & Wen, Y. (2024). A deep natural language processing‐based method for ontology learning of project‐specific properties from building information models. Computer‐Aided Civil and Infrastructure Engineering, 39(1), 20–45. https://doi.org/10.1111/mice.13013
Yun, J., & Kim, J. (2022). An analysis of research and standardization trends on digital twin. Society for Standards Certification and Safety. https://doi.org/10.34139/JSCS.2022.12.1.31
Zhang, Y., & Zhang, H. (2023). FinBERT–MRC: financial named entity recognition using BERT under the machine reading comprehension paradigm. Neural Processing Letters, 55(6), 7393–7413. https://doi.org/10.1007/s11063-023-11266-5
Zhang, Q., Xue, C., Su, X., Zhou, P., Wang, X., & Zhang, J. (2023). Named entity recognition for Chinese construction documents based on conditional random field. Frontiers of Engineering Management, 10(2), 237–249. https://doi.org/10.1007/s42524-021-0179-8
View article in other formats
Published
Issue
Section
Copyright
Copyright (c) 2026 The Author(s). Published by Vilnius Gediminas Technical University.
License

This work is licensed under a Creative Commons Attribution 4.0 International License.