Ontology development using language model-based Named Entity Recognition for integrated construction information

DOI: https://doi.org/10.3846/jcem.2026.26519

Abstract

Named Entity Recognition (NER) is crucial for building knowledge bases and facilitating semantic search in the construction industry. While conventional NER models can identify general entities such as spatial and organizational information, extracting domain-specific entities, like materials and dimensions from construction-related texts –particularly in Bill of Quantities (BoQ) and Building Information Modeling (BIM) parameters – remains challenging extensive manual annotation.
Key entity categories were defined, and datasets from four BoQ and two BIM sources were annotated to establish ground truth labels. A semi-automated labelling process was introduced to streamline annotation and improve training efficiency. Experimental results demonstrate that the proposed framework reduces annotation time by nearly threefold compared to manual processes. This study developed a BERT-based NER model achieving F1 scores ranging from 0.81 to 0.97, with higher performance for well-defined construction parameters (name, material, size, thickness, diameter, length, type: 0.95–0.97) compared to miscellaneous text entities (0.81).
Despite extensive research in construction NLP, existing approaches fail to address the integration challenges between heterogeneous BIM-BoQ data formats and lack domain-specific entity recognition capabilities. The extracted entities are aligned with standardized formats using semantic text similarity techniques. This ontology-based integration enhances data consistency, interoperability, and retrieval accuracy, improving semantic alignment while minimizing discrepancies from heterogeneous terminology.

Keywords:

Large Language Models (LLM), Natural Language Processing (NLP), Named Entity Recognition (NER), deep learning, Building Information Modeling (BIM), construction data integration, ontology development, data standardization

How to Cite

Choi, G., Kwon, S., Song, J., Akbar, A., & Hong, J.- taek. (2026). Ontology development using language model-based Named Entity Recognition for integrated construction information. Journal of Civil Engineering and Management, 32(4), 548–562. https://doi.org/10.3846/jcem.2026.26519

Share

Published in Issue
May 13, 2026
Abstract Views
0

References

Arzideh, K., Schäfer, H., Allende-Cid, H., Baldini, G., Hilser, T., Idrissi-Yaghir, A., Laue, K., Chakraborty, N., Doll, N., Antweiler, D., Klug, K., Beck, N., Giesselbach, S., Friedrich, C. M., Nensa, F., Schuler, M., & Hosch, R. (2025). From BERT to generative AI – Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports. Computers in Biology and Medicine, 195, Article 110665. https://doi.org/10.1016/j.compbiomed.2025.110665

Beetz, J., van Leeuwen, J., & de Vries, B. (2009). IfcOWL: A case of transforming EXPRESS schemas into ontologies. Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 23(1), 89–101. https://doi.org/10.1017/S0890060409000122

Cho, H., & Lee, H. (2019). Biomedical named entity recognition using deep neural networks with contextual information. BMC Bioinformatics, 20, Article 735. https://doi.org/10.1186/s12859-019-3321-4

Gruber, T. (1993). Towards principles for the design of ontologies used for knowledge sharing. In N. Guarino, & R. Poli (Eds.), Formal ontology in conceptual analysis and knowledge representation. Kluwer Academic Publishers.

Halmetoja, E. (2022). The role of digital twins and their application for the built environment. In M. Bolpagni, R. Gavina, & D. Ribeiro (Eds.), Industry 4.0 for the built environment: Vol. 20. Structural integrity (pp. 415–442). Springer, Cham. https://doi.org/10.1007/978-3-030-82430-3_18

Jagannathan, M., Roy, D., & Delhi, V. S. K. (2022). Application of NLP-based topic modeling to analyse unstructured text data in annual reports of construction contracting companies. CSI Transactions on ICT, 10(2), 97–106. https://doi.org/10.1007/s40012-022-00355-w

Jeon, K., Lee, G., Yang, S., & Jeong, H. D. (2022). Named entity recognition of building construction defect information from text with linguistic noise. Automation in Construction, 143, Article 104543. https://doi.org/10.1016/j.autcon.2022.104543

Jeong, D. W., Park, J. H., Seo, J. J., Shin, W. H., & Jang, H. Y. (2024). The progress and prospective advancements of the national digital twin pilot project. Journal of Korean Society for Geospatial Information Science, 32(3), 51–63. https://doi.org/10.7319/kogsis.2024.32.3.051

Keshavarz, H., Vagena, Z., Kouki, P., Fountalis, I., Mabrouki, M., Belaweid, A., & Vasiloglou, N. (2022). Named entity recognition in long documents: An end-to-end case study in the legal domain. In 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan. IEEE. https://doi.org/10.1109/BigData55660.2022.10020873

Kim, S.-Y., Lee, H.-H., Choi, E.-S., & Go, J.-U. (2020). A case study on the construction of 3D geo-spatial information for digital twin implementation. Journal of the Korean Association of Geographic Information Studies, 23(3), 146–160.

Kuiper, I., & Duffield, C. (2018). Describing structural configurations towards identifying and establishing theoretical foundations for the exploration and understanding of building information modelling (BIM). Department of Infrastructure Engineering, The University of Melbourne.

Lê, N. C., Nguyen, N.-Y., & Trinh, A. D. (2019). On the Vietnamese name entity recognition: A Deep Learning Method approach. In 2020 RIVF International Conference on Computing and Communication Technologies (RIVF), Ho Chi Minh City, Vietnam. IEEE. https://doi.org/10.1109/RIVF48685.2020.9140754

Lester, B., Pressel, D., Hemmeter, A., Choudhury, S. R., & Bangalore, S. (2020). Multiple word embeddings for increased diversity of representation. arXiv. https://doi.org/10.48550/arXiv.2009.14394

Li, S., Wang, J., & Xu, Z. (2024). Automated compliance checking for BIM models based on Chinese-NLP and knowledge graph: An integrative conceptual framework. Engineering, Construction and Architectural Management, 32(6), 3832–3856. https://doi.org/10.1108/ECAM-10-2023-1037

Luo, Y., Xiao, F., & Hai, Z. (2019). Hierarchical contextualized representation for named entity recognition. ArXiv. https://doi.org/10.48550/arXiv.1911.02257

Luoma, J., & Pyysalo, S. (2020). Exploring cross-sentence contexts for Named Entity Recognition with BERT. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 904–914), Barcelona, Spain. https://doi.org/10.18653/v1/2020.coling-main.78

Na, Y. G., & Kim, J. Y. (2024). Metadata design for interoperability of digital land information. Journal of the Korean Cadastre Information Association, 26(3), 133–145. https://doi.org/10.46416/JKCIA.2024.12.26.3.133

Pakhale, K. (2023). Comprehensive overview of named entity recognition: Models, domain-specific applications and challenges. arXiv. https://doi.org/10.48550/arXiv.2309.14084

Rasmussen, M. H., Lefrançois, M., Schneider, G. F., & Pauwels, P. (2021). BOT: The building topology ontology of the W3C linked building data group. Semantic Web, 12(1), 143–161. https://doi.org/10.3233/SW-200385

Sammet, J., & Krestel, R. (2023, September). Domain-specific keyword extraction using BERT. In S. Carvalho, A. F. Khan, A. O. Anić, B. Spahiu, J. Gracia, J. P. McCrae, D. Gromann, B. Heinisch, & A. Salgado (Eds.), Proceedings of the 4th Conference on Language, Data and Knowledge (pp. 659–665), Vienna, Austria.

Taher, E., Hoseini, S. A., & Shamsfard, M. (2020). Beheshti-NER: Persian named entity recognition using BERT. arXiv. https://doi.org/10.48550/arXiv.2003.08875

Taillé, B., Guigue, V., Gallinari, P., & Paribas, B. (2019). Une Étude Empirique de la Capacité de Généralisation des Plongements de Mots Contextuels en Extraction d’Entités. In Conférence Nationale d’Intelligence Artificielle Année 2019.

Tang, S., Zhang, C., Hao, J., & Guo, F. (2022). A framework for BIM, BAS, and IoT data exchange using semantic web technologies. In Construction Research Congress 2022. ASCE. https://doi.org/10.1061/9780784483961.098

Wu, S., Shen, Q., Deng, Y., & Cheng, J. (2019). Natural-language-based intelligent retrieval engine for BIM object database. Computers in Industry, 108, 73–88. https://doi.org/10.1016/j.compind.2019.02.016

Wu, C., Wang, X., Wu, P., Wang, J., Jiang, R., Chen, M., & Swapan, M. (2021). Hybrid deep learning model for automating constraint modelling in advanced working packaging. Automation in Construction, 127, Article 103733. https://doi.org/10.1016/j.autcon.2021.103733

Wu, C., Li, X., Guo, Y., Wang, J., Ren, Z., Wang, M., & Yang, Z. (2022a). Natural language processing for smart construction: Current status and future directions. Automation in Construction, 134, Article 104059. https://doi.org/10.1016/j.autcon.2021.104059

Wu, L.-T., Lin, J.-R., Leng, S., Li, J.-L., & Hu, Z.-Z. (2022b). Rule-based information extraction for mechanical-electrical-plumbing-specific semantic web. Automation in Construction, 135, Article 104108. https://doi.org/10.1016/j.autcon.2021.104108

Wu, D., Yang, J., & Wang, K. (2024). Exploring the reversal curse and other deductive logical reasoning in BERT and GPT-based large language models. Patterns, 5(9), Article 101030. https://doi.org/10.1016/j.patter.2024.101030

Xie, S. (2024). Research on Named Entity Recognition Method based on BERT model. In 2024 IEEE 10th International Conference on Big Data Computing Service and Machine Learning Applications (BigDataService) (pp. 92–96), Shanghai, China. IEEE. https://doi.org/10.1109/BigDataService62917.2024.00020

Xu, X., & Cai, H. (2021). Ontology and rule-based natural language processing approach for interpreting textual regulations on underground utility infrastructure. Advanced Engineering Informatics, 48, Article 101288. https://doi.org/10.1016/j.aei.2021.101288

Yang, G., & Xu, H. (2020). A residual BiLSTM model for named entity recognition. IEEE Access, 8, 227710–227718. https://doi.org/10.1109/ACCESS.2020.3046253

Yin, M., Tang, L., Webster, C., Li, J., Li, H., Wu, Z., & Cheng, R. C. (2023). Two-stage Text-to-BIMQL semantic parsing for building information model extraction using graph neural networks. Automation in Construction, 152, Article 104902. https://doi.org/10.1016/j.autcon.2023.104902

Yin, M., Tang, L., Webster, C., Yi, X., Ying, H., & Wen, Y. (2024). A deep natural language processing‐based method for ontology learning of project‐specific properties from building information models. Computer‐Aided Civil and Infrastructure Engineering, 39(1), 20–45. https://doi.org/10.1111/mice.13013

Yun, J., & Kim, J. (2022). An analysis of research and standardization trends on digital twin. Society for Standards Certification and Safety. https://doi.org/10.34139/JSCS.2022.12.1.31

Zhang, Y., & Zhang, H. (2023). FinBERT–MRC: financial named entity recognition using BERT under the machine reading comprehension paradigm. Neural Processing Letters, 55(6), 7393–7413. https://doi.org/10.1007/s11063-023-11266-5

Zhang, Q., Xue, C., Su, X., Zhou, P., Wang, X., & Zhang, J. (2023). Named entity recognition for Chinese construction documents based on conditional random field. Frontiers of Engineering Management, 10(2), 237–249. https://doi.org/10.1007/s42524-021-0179-8

View article in other formats

CrossMark check

CrossMark logo

Published

2026-05-13

Issue

Section

Articles

How to Cite

Choi, G., Kwon, S., Song, J., Akbar, A., & Hong, J.- taek. (2026). Ontology development using language model-based Named Entity Recognition for integrated construction information. Journal of Civil Engineering and Management, 32(4), 548–562. https://doi.org/10.3846/jcem.2026.26519

Share