TEXT MINING-BASED PATENT ANALYSIS OF BIM APPLICATION IN CONSTRUCTION

. As a data tool applicable to the full life-cycle of construction engineering and management, Building Information Modeling (BIM) has great potential for significantly increasing project productivity and performance. Awareness of BIM application hotspots and forecasting its trends can drive innovations in construction field. Using patents as data resources, this study develops an effective framework integrating the citation network analysis and the topic clustering technology to identify BIM application information and forecast its trends. This framework comprises three-step analysis: (1) quantitative characteristic analysis of patent outputs; (2) Social Network Analysis (SNA)-based co-occurrence network analysis; and (3) identification of BIM topics using a Latent Dirichlet Allocation (LDA). Finally, the case demonstrates the effectiveness of this framework contributing to promote technological development and innovation of BIM. The contributions of this study are threefold: (1) an innovative text mining-based framework for BIM patent analysis in construction is developed; (2) patents that have focused on identifying the application hotspots and development trend of BIM in accordance with our developed framework are reviewed; and (3) a signpost for technological development and innovation of BIM is provided.


Introduction
The Engineering Fronts research project has been jointly launched by the Chinese Academy of Engineering, Clarivate Analytics, and Higher Education Press since 2017 (Cai et al., 2018). The global experts are gathered to evaluate current frontiers in engineering research, and provide strategic responses to the issues of global challenges and sustainable development. In the field of construction engineering and management, the frontiers focus on demanddriven technology. The development front, referring to the development of technology, can be screened out from patent data aiming to solve practical problems in technology. It is forward-looking, pioneering and exploratory, which has a major impact and indispensable role in the future of engineering technology. The top development fronts include construction, transportation, mechanical, energy, medicine, electronics, and others. Especially, the Building Information Modeling-based (BIM-based) construction management system is one of the top 10 development fronts (Cai et al., 2018). BIM is often interpreted as an overarching tool for describing a variety of activities in object-oriented Computer Aided Design (CAD), using modern modeling techniques, such as Revit and Tekla. The content generated by architects, designers and engineers has evolved from traditional 2D-drawings to object-oriented 3D-models embedded with detailed information (Johansson et al., 2015). As a digital replica, BIM is used as an information repository to support a variety of applications in the design and construction stages (Miettinen & Paavola, 2014). Moreover, BIM can further promote the real-time visualizations since all data can be represented with a 3D fashion. This particular feature can be used to communicate ideas and share information when engaging with multiple stakeholders in a project (Johansson et al., 2015). As seen from the aforementioned studies, as a data tool applicable to engineering design, construction, and management, Building Information Modeling (BIM) has brought forth great changes to precisely digitalize the virtual model of building in the full life-cycle of construction engineering and management (Azhar, 2011). For example, BIM can be applied to cost management, quality and safety management, technical management, and field management, which improves greatly the communication efficiency of project participators and realizes the control over cost and progress to a larger degree (Volk et al., 2014;Park et al., 2013;Lu et al., 2017).
Nevertheless, many experts have different views on the development trend of BIM. Mohd and Ahmad (2013) explored BIM research in construction planning and found that BIM can be used to detect clash analysis during design phase before construction, as well as to optimize the project schedule, cost, quality, and enhance communication between constructors. Jia et al. (2017) investigated the research of BIM on residential buildings' design stage under the generally used Design-Bid-Build (DBB) mode. However, quite little has been done about the analysis of the BIM' application hotspots and development trend, which is harmful to the development and innovation of BIM technology.
Patent analysis may help to understand the innovation and development of a technological field in the course of time (Griliches, 1990;Mcaleer & Slottje, 2005). For instance, patents can be used to optimize investments by avoiding ill investment decisions in unnecessary technology areas and design research strategy to secure core patents (Kim et al., 2008;Ju & Sohn, 2015). The patent is one of the most reliable sources of information technology, and hence, they are used as a data tool for analyzing technological developments (Pilkington et al., 2009;Zhang, 2011). Patent citation information is often applied to investigate the development of patented technologies in different fields. For instance, Daim et al. (2020) discussed the relative locations of a company in the technological network based on patent citations of the IoT cybersecurity and blockchain. However, there is still a need to apply patent data analysis in the BIM field to explore the BIM' development trend and application hotspots.
This study aims to identify the application hotspots of BIM and forecast its trends in an effective framework using patents as data resources and integrating two text mining technologies, the citation network analysis and the topic clustering technology. This framework comprises three-step analysis: (1) quantitative characteristic analysis of patents outputs to provide a visual overview of BIM patents; (2) the production of a Social Network Analysis (SNA)-based co-occurrence network analysis to determine the citation interrelations between BIM patents; and (3) identification of BIM patent topics using a Latent Dirichlet Allocation (LDA) to further analysis the topic characters of critical patents. Besides, a good clustering approach of patent map demonstrates the effectiveness of the developed LDA model. In this study, the actual case demonstrates the effectiveness of this framework contributing to promote the knowledge diffusion within the BIM field for further research and explore critical patents to promote technological development and innovation.
The contributions of this study are threefold: (1) an innovative text mining-based framework for BIM patent analysis in construction is developed; (2) patents that have focused on identifying the application hotspots and development trend of BIM in accordance with our developed framework are reviewed; and (3) a signpost for technological development and innovation of BIM is provided.

Patents' analysis
Patents are often regarded as an output of Research and Development, an indicator of innovation and the timely measure of rapid technological change (Kang, 2014;Markatou & Vetsikas, 2015). Patent analysis usually refers to the analysis, processing and combination of a large number of fragmented patent information in patent specifications and patent publications. Using statistical methods, patent information can be transformed into intellectual understanding with overall and predictive capabilities.
With the widely used Information Technology (IT), patent analysis has been extensive used as a powerful tool in different fields (Kim & Bae, 2017;Boshnakoska et al., 2013). For instance, in the hydrogen energy and fuel cell field, Li et al. (2020) used a patent citation analysis and text mining to monitor and forecast the development trends of nanogenerator technology. In the coherent light generator technology field, You et al. (2017) applied patent analysis technology to predict the development trends of coherent light generator technology and revealed the focal technology areas in Research and Development. In the Biosensor Technologies field, N. J. Sheikh and O. Sheikh (2017) used patent analysis to identify the trends for the three types of biosensors, including blood, saliva, and breath. In the medical field, Zhang et al. (2016) used patent analysis to reveal the hot research fields of Chinese medicine patents and the future development trend of traditional Chinese medicine. In the building field, Altwies and Nemet (2013) made an assessment of patent citations in building energy control technology to improve the innovation in the U.S. building sector.
All in all, patent analysis helps the researchers understand the trends of related technologies and research hotspots in the targeted field. Analysis of patent data allows subsequent research to focus on core patents. Besides, patent analysis is able to identify new development directions and explore new and competitive patented technologies, providing technical strategies for the development of construction industry.

Social network analysis of patents
Social Network Analysis (SNA), as a useful tool in analyzing relationship in a social network, has some benefits of powerful management of examining their characteristics such as relationships, information exchange, physical attributes, etc. (Hatala & Lutta, 2009). It is particularly useful for the quantitative and visual analysis of the interpretation of human associations.
Researchers have been investigating the applicability of the social networks since the early 20th century in diverse fields, including sociology, biology, economics, and several others (Al Hattab & Hamzeh, 2015;Han et al., 2017;Ardito et al., 2017). Yoon et al. (2011) proposed a novel method called Invention Property-Function Network Analysis (IPFN) to extract properties and functions from patents related to silicon-based thin film solar cells, and to identify the technological implications of properties and functions using SNA. Perng and Huang (2016) revealed the current status of patent activities, major assignees, assignee countries, and core technology fields of shading devices by using SNA.
As seen from the aforementioned studies, SNA is able to help researchers understand the network relationship visually, as well as convey the results of the analysis. Most importantly, SNA can reveal hidden patterns that might not be captured by conventional qualitative measures and assist professionals to identify the emerging development trends of new technologies.

Text clustering of patents
Text clustering can be utilized to extract data distribution information from textual unstructured data where there is no prior knowledge (Sohrabi et al., 2018). Given a sample set of texts, text clustering relies on a distance-based approach to divide text data into multiple categories by using common methods, such as agglomeration and partitioning algorithms. Among them, agglomeration-based algorithms are designed to merge documents into different clusters according to the identified relationships among documents, and the generated clusters can then be organized into a dendrogram or a cluster hierarchy (Sohrabi et al., 2018).
Many attempts can be found in some literatures. Yang et al. (2009) used text clustering to forecast the future possible tendency of Machine Learning. Li and Wu (2010) used text clustering to detect and forecast online forums research hotspots. Han et al. (2017) used patent information to obtain an overview of different technology topics in a given field and find core patents. Sohrabi et al. (2018) applied text clustering algorithms on the frontpage information and acquired trends in information systems journal articles in the field of Human Resources Management. Li et al. (2019a) used text clustering to identify technology evolution paths and forecast technology development trends of perovskite solar cell technology within the short term.
From the investigation of those previous literatures, it is clear that, although text clustering method and SNA technology have been widely used, quite little has been done about the analysis of the BIM patent for identifying the application hotspots of BIM and forecasting its trends.

Methods
This study develops an effective framework using patents as data resources and integrating two text mining technologies, the citation network analysis and the topic clustering technology, to identify the application hotspots of BIM and forecast its trends. Figure 1 shows the outline of research design, detailly discussed in Sections 2.1 to 2.2.

Data collection and processing
In this study, the patent data is obtained from the Derwent Innovations Index (DII) database entering from 2004 to 2020. Patent retrieval queries consists of keywords and publication dates. The patent data is initially screened by the The entire data collection process mainly follows two steps. Firstly, considering the BIM is an integrated concept, the keyword expansion and searched both title and abstract are preformed to capture a wider picture. In this study, the "BIM" and "Building Information Modeling" are as keywords to search for the title and abstract of the patents in the Derwent Innovations Index database with confined publication time from 2004 to 2020. Secondly, a resultant 1510 patents are exported and manually screened out. Data dictionary backup is also established based on the those manually screened patents.
After the data collection, the preprocessing for BIM patent is conducted using tokenization and stopwords removing. The data preprocessing is performed by stopwords which removed and used the dictionary to merge the keywords in the original text to strictly conform to the dictionary content. Developing tool of models in Python 3.6 of the Anaconda environment main packages (such as: genism, nltk, openpyxl) are used for implementing the Latent Dirichlet Allocation (LDA) model. Gephi 0.9.2 is then used to perform the visualization of network analysis.

SNA-based citation network analysis
Social Network Analysis (SNA) is a scientific and systematic approach in which a node is a role and links are used to show relationships between nodes (Perng & Huang, 2016). Regarding the attributes of the links, there are two types of social networks: directed networks and undirected networks. Links in undirected networks are merely used for representing a general relationship between nodes, while arrows are used to indicate a directional relationship between nodes in directed networks. By using the network measures such as density, degree centrality, intermediateness, and tightness centrality, the critical nodes and prominent network features can be revealed accordingly. The following network measures are applied in this study: (1) Density For a given network or graph, the Density (D) is a measure that counts the number of links to other nodes in a directed or undirected network, as shown in Eqns (1) and (2), respectively (Zhong et al., 2019). This measure is a network-level measure for assessing the overall connectivity of a network with N nodes. It can be used to identify the completeness of the network as a network with a high density is highly connected.
where N is the total number of nodes and L is the total number of links in a given network.
(2) Degree centrality Degree centrality (CD) is a nodal-level centrality measure that measures the number of directed or undirected links that a node has with other nodes in a given network (Zhong et al., 2019). This nodal measure can be calculated using Eqn (3) and can be used to indicate a node's number of connected neighbors in the network. In a social network, generally speaking, a node with a higher degree centrality indicates that it has more influence and importance than other nodes (Krebs, 2002).
where P k is a given node k; N is the total number of nodes; a(P i , P k ) = [1 if node i and node k are connected, 0 otherwise].
(3) Eigenvector centrality Eigenvector centrality is another important nodal measure, which measures the influence of a node in a given network by assigning relative scores to all nodes based on the concept: connections to high-scoring nodes contribute more to the score of the node than equal connections to low-scoring nodes. Let G(E, V) be a graph, consisting of vertices V and edges E. Let A be the adjacency matrix for this graph; a ij = 1 if vertices i and j are connected by an edge and a ij = 0 if they are not. Because A is symmetric all its eigenvalues are real, its eigenvectors are orthogonal, and it is diagonalizable (Batool & Niazi, 2014). Eqn (4) describes eigenvector centrality x in two equivalent ways, as a matrix equation and as a sum. The centrality of a vertex is proportional to the sum of the centralities of the vertices to which it is connected. λ is the largest eigenvalue of A and n is the number of vertices:

LDA-based topic clustering
Latent Dirichlet Allocation (LDA), originally developed by Kim et al. (2018), is a generative probabilistic modeling approach for revealing hidden semantic structures in a collection of textual documents. This technique uses the "Bag of Words" approach, which sees each document as a word-frequency vector. It transforms the text data into a bundle of numerical inputs that is compatible with ample data mining algorithms (Wang & Xu, 2018). Each document can be viewed as a representation of a probability distribution of topics, and each topic can be viewed as a representation of a probability distribution of many words (Blei et al., 2017). Moreover, considering the non-correlation between the components of the random vector in the Dirichlet distribution, the candidate topics are independent of each other (Wang & Xu, 2018). To achieve the above objectives, the LDA uses a joint distribution to compute the conditional distribution of the hidden variable under a given observable variable. The observable variable is a set of words, and the latent variable is the topics.
The concept of LDA is presented in Figure 2. LDA has two main steps: topic distribution per document and word distribution per topic. First, based on the Dirichlet distribution with super parameter of α, the probability of the topic in the document is derived. Then, based on the Dirichlet distribution with super parameter of β, the probability of the word in the topic is derived. A common approach, Gibbs sampling, is implemented for approximate inference.
In Figure 2 α is the parameter of the Dirichlet prior on the per-document topic distributions; β is the parameter of the Dirichlet prior on the per-topic word distribution.
The perplexity (P) approach is used to determine the topic number K. An appropriate probability distribution has a relatively low perplexity. The perplexity (P) can be calculated by Eqn (5): where N d eans word frequency in the d document.
, d i W eans the nth word in the d document.
One challenge of this method, similar to other unsupervised learning method, is that the correct result is often unobtainable, which means the result needs to be manually checked. Inspired by the article of Kim et al. (2018), the empirical parameters of LDA are α = K/50 and β = 0.01, respectively.

Result analysis
In this section, the obtained results and corresponding analyses of the study are presented, including quantitative characteristics of BIM patents outputs, SNA-based citation network analysis of BIM patents, and LDA-based topic clustering of BIM patents. This section aims to draw a comprehensive understanding picture of the selected BIM patents through these three aspects of analysis and summarized insights.

Development trend of BIM patents
After data collection and preprocessing, the trend analysis result is shown in Figure 3. This process provided the annual number of patents on BIM technology. As shown in Figure 3, the trend of granted BIM patents from 2010 to 2020 increases significantly. It indicates that BIM serves can be as a more and more important and useful tool in facilitating the construction for its benefits of powerful management of physical and functional digital presentations.

National technical strength analysis of BIM patents
Analyzing the countries of the BIM patent yields the following Figure 4. It shows that the share of China (CN) reaches 70.43%, the share of Korea (KR) reaches 12.10% and the United States of America (US) reaches 9.81%, respectively, in the overall picture. A possible explanation of this result could be based on the fact that China has been known as a major construction country. Under the initiative of the "Belt and Road" national strategic planning, it is very likely that China occupies an important and a dominant role in the construction field.

Technical classification analysis of BIM patents
The statistical analysis of the International Patent Classification (IPC) (as shown in Figure 5) shows that the three main technical fields of BIM patents are G06F (electrical digital data processing), G06Q (data processing system or method specifically for administrative, commercial, financial, management, supervisory or predictive purposes) and G06T (general image data processing or generation). Since each patent contained one or more IPC codes, the total number of IPC is greater than the number of patents. Shown in Figure 5, most of the authorized patents are concentrated on G06F, which counted for 48.49% of the total. The second dominant category is G06Q, followed by G06T. After analyzing the corresponding results above, it reveals that the three IPC of G06F, G06Q, and G06T, accounting for 84.78% of total, are recognized as important front direction of BIM technology. The front direction of BIM application can assist the managers make decision and improve the management of construction scheme design, field management and later operation. For the IPC of G06F, for example, the current application trend indicates that researchers are increasingly interested in applying data processing system or method for BIM information management. Accordingly, manager may need focus on the standard for BIM technology with complete, accurate, and timely data exchange and interoperability to improve the cooperation and communication between the various project participants.

SNA-based citation network analysis of BIM patents
SNA-based citation network analysis, as a typical technique for measuring the knowledge flow of information and innovation, contains valuable data and if analyzed well, can reveal the development trend and application hotspots of technology (Sharma & Tripathi, 2017). The constructed patent citation network is a directed graph that points from the cited patent to the referenced patent. This patent citation network can show the flow of information and knowledge, including the spread and the diffusion of knowledge.
Since the citations indicate the technological development context, individuals can develop citation networks through patent observation techniques. The more the patent is cited, the more important this patent is. The highly cited patents represent the hotspots of the technology and the guiding future technological innovations. Generally, the highly cited patents are at the core of the network, indicating that it is at a central place and conducive to technology diffusion.
To analyze these influential technologies, a core network of BIM technology reference relationships is built based on link weights ( Figure 6). The patent citation network consisted of 1941 nodes with a graph density of 0.001 and an average degree of 1.116. The higher the degree of concatenation, the more frequently the patent is cited, and the more important the display is.
As shown in Figure 6 and Table 1, the six core technical fields of BIM patents are US20150248503A1, US20150057981A1, KR2010020060A, US5228038A, CN103093061A, and CN102609417A, respectively. Analysis of core patents finds that Figure 6 shows a core network of "US20150248503A1" consisting of a set of major nodes and several citations. In Figure 6, the "US20150248503A1", the patent code of "A method and system for creating 3D models from 2D data for building information modeling (BIM)", appears as the larger sized node. It indicates the "US20150248503A1" patent is highly cited and represents the application hotspot of BIM. Besides, shown in Table 1, the three key performance indicators, namely connected degree, eigenvector centrality and page rank, garner further evaluation into the high value of "US20150248503A1" patent. The modeling of computer aided design drawings into a 3D model (such as AUTODESK REVIT, AUTOCAD, VECTORWORKS, MICROSTATION, ARCHICAD and other software) will reduce the time and labor costs involved in the on-site construction process. Besides, the co-occurrence citation interrelations of the important patents are identified. For the highly cited patent of "US20150248503A1", for example, the "US20120310906A1" (Building Information Tracking System and Method of Use) has interrelations with "US20150248503A1". The interrelations may indicate that the building information tracking system (e.g. the transferring unit for transferring captured information to data repository and versioning unit for versioning infor- Number of BIM documents mation in repository) plays an important part for tracking content of 3D models by using mobile computing devices. Noteworthy the six core technical fields extracted from the above have close links with the results for the International Patent Classification (IPC) of BIM presented in Section 3.1.3. In Table 1, the "US20150248503A1", as a typical technology belonging to the IPC of G06F (electrical digital data processing), reveals that this technology of 3D model from two-dimension data plays an important part in the BIM-related data processing technologies. For example, as the development of 3D model from twodimension data, managers can intuitively understand the potential relationship between variables that contribute to causing a collision, and then constructor can optimize the construction design by the navigation collision function of BIM 3D.

LDA-based topic clustering of BIM patents
As a typically topic clustering approaches, the LDA model has the benefits of considering the word frequency and semantic features of surrounding words to effectively extract the keywords . Besides, this model can reduce intensive manual effort, which does not require labeled data. For these benefits, the LDA technique is often used to mine the main topics and identify their respective (corresponding) keywords (Bastani et al., 2018). In this study, the LDA-based topic clustering model is proposed to garner further insights into the application hotspots and development trend of BIM technology.
Here, the topic model LDA is used to investigate the main topics and identify their respective (responding) keywords. These keywords under a topic are treated as the features hidden in the topic, and then the keywords reflect some correlation in a topic.
The relatively appropriate number of topics of all preprocessing data are as shown in Figure 7. The appropriate numbers of topics of documents are 11 by calculating the perplexity. After analyzing all patents, 10 high-quality topics are then selected, as shown in Table 2. LDA is made for 10 topics to identify the latent topics and words distributions. Table 2 presents the top 10 words with the largest weights in each topic, where some impact factors of pre-topic can be extracted. According to the words distributions, the topics are: "Power supply module based on BIM technology", "Steel structure analyzing and detecting management system", "Energy production apparatus management system", "Mobile terminal of quantity calculation method management system based on BIM technology", "BIM model monitoring and memory system", "Plate object display processing management system", "Scene simulation analysis based on BIM technology", "Device management and control center", "Construction method of lift operation system based on BIM technology", and "Project control management system based on Banknote identification module technology". In Table 2, keywords in the per-topic distribution represent closely related factors. Besides, the number represents the probability of keyword in the per-topic. After analyzing the corresponding results generated by the LDA model, the keywords under a topic are identified as the features hidden in the large-scale BIM-related technology corpus. For the topic "Steel structure analyzing and detecting management system", for example, it provides a more in-depth analysis of the hotspots technology of G06Q (data processing system or method specifically for administrative, commercial, financial, management, supervisory or predictive purposes). This topic would be further taken as the intellectual base and application hotspots of G06F for the future research. Furthermore, the keyword distribution ("Detecting", "Camera", "Engineering", "Steel structure", "Processor") may have a closely cooccurrence relationship with each other (e.g. the detecting camera may be a favorable tool to obtain the detection image for the position arrangement of steel structures). Similarly, other topics shown in Table 2, extracted using the LDA model, contain qualitative information, such as the keyword distribution of the per-topic.
All in all, it is clear to see that the ten topics reflect the application hotspots and development trend of BIM, and they may become important technologies leading the development of emerging industries in the future.
These right research directions are more than a necessity for scholars and practitioners in this industry to provide some references and help for scholars in the field of BIM research.
In order to compare and verify the results of the LDA topic mining, the Clarivate Analytica's Derwent Innovation, as the world's most authoritative and reliable patent data and powerful analytical capabilities, provides technical support for this research. Derwent Innovation includes patent maps which display data in topographical maps and capture technology topics intelligently. Through the patent map, it assists researchers to understand the status of technology trends, technology branches, and technology relations, which will help determine the direction of mining, inspire mining ideas, stimulate new ideas, avoid patent infringement, and improve the quality of research and development technology. In this way, it can reveal the new technical fields and means, find opportunities for technological development in relatively technologyintensive fields, and then promote innovation activities, promote technology research and development, and finally translate them into corresponding patent achievements.
The patent map is generated based on a co-word analysis. Co-word analysis is a relatively new method of bibliometrics. The main principle of co-word analysis is to count the occurrence of a group of words in the same literature in pairs, and then make clustering analysis on these words, so as to reflect the affinity between these words, and finally analyze the structure and change of the topics that these words represent.
Therefore, this study performs a BIM patent map to analyze the BIM technology hotspots and predict the promising BIM technology flows (Figure 8). Through the cluster analysis of the database and the analysis of quoted figures, this study displayed the hot topics in white and the gray topics in the secondary. In the Figure 8, these topics labeled in white (e.g. "Module connect", "Reinforce steel", "Attribute parameter" and "Scan cloud", etc.) and topics labeled in gray (e.g. "Sensor connect", "Power supply module Through the results of LDA topic mining and co-word analysis above, it can be found that most topics are similar and only some topics are different. These topics shown in Figure 8 have close links with these topics shown in Tab. 2. For example, "Power supply module" can be matched with "Power supply module based on BIM technology", "Management equipment" can be matched with "Device management and control center", "Terminal user" can be matched with "Mobile terminal of quantity calculation method management system based on BIM technology", "Monitor data" can be matched with "BIM model monitoring and memory system", "Reinforce steel" can be matched with "Steel structure analyzing and detecting management system", and "Screen lift" can be matched with "Construction method of lift operation system based on BIM technology". Co-word analysis is essentially a co-occurrence analysis method, but LDA is a method of introducing probability distribution and Bayesian prior theory into topic analysis. Thence, there are a few different topics due to the different algorithms.

Discussion and limitations
Using patents as data resources, an effective framework integrating citation network analysis and LDA-based topic clustering technology is developed to identify the application hotspots of BIM and forecast its trends. Shown in the results above, it can be seen that the current BIM-based patents are mainly concentrated in different directions. Especially, BIM-related integration technology is increasingly interested by researchers. As is known to all, computer vision is used to detect and track site entities for the extraction of the safety context on a real-time basis and computer vision-based systems are able to automate the monitoring of sites and retrieving contextual information processes (Seo et al., 2015). With the development of computer vision technology, BIM technology will be integrated with it to promote the development of the construction industry. Ye et al. (2018) developed a cup of the theory to represent the integration of BIM, IoT (Internet of Things), and blockchain, which can be used to design an efficient building maintenance system. In the article of Liu et al. (2019), a framework combing blockchain with BIM was proposed for sustainable building design coordination and collaboration in multiple building life cycle stages and solve the challenge of BIM in intellectual property and legal responsibility. Therefore, BIM technology can be integrated with it and this will provide unique opportunities for the development of smart buildings and smart cities.
Furthermore, the GIS-based technologies can deal with a large-scale presentation and centralization of the data with geographical information using real world coordinates (Isikdag, 2015). Particularly, they are relatively competent in 2D geometry modeling and provide mechanisms of multi-representation, such as multiple layers of detailed information (Mignard & Nicolle, 2014). In con-trast, BIM is more emphasized on object-oriented modeling of data with complete semantic and typically applied for modeling new buildings and structures and cover both physical and functional characteristics of the subject in 3D representation (Mignard & Nicolle, 2014). Therefore, there is an opportunity for BIM technology to integrate with GIS to manage urban facilities.
Last but not the least, shown in the results above, the significance of BIM technology with complete, accurate, and timely data has been increasingly prominent in recent years. As said by Turk and Klinc (2017), the blockchain technology can potentially prove the development of BIM for its benefits of ensuring the BIM data immutable, traceable, and transparent. As a relatively new concept in the construction industry, the blockchain technology provides a method for creating solutions to the long-standing issue of lack of accountability and fragmented information sources in the construction industry (Sheng et al., 2020). This method can well realize the decentralized information interaction and provides a secure and stable trust environment to improve the collaboration among participants (Li et al., 2019b). From our point of view, the combination of BIM and blockchain technology could offer huge potential and could be seen as one of the future directions of development. In fact, it has already been starting to reveal that the construction industry is increasingly moving toward an advanced, automated, and intelligent way of development in the overall picture.
The research presented in this study, however, is not without its limitations. Firstly, this study identifies application hotspots merely based on LDA topic mining, which might cause a limited exploration of more hidden patterns in the field. In addition, the accuracy of the LDA algorithm can be difficult to justify. In our study, a good clustering approach of patent map demonstrates the effectiveness of the developed LDA model. However, expert knowledge and professional assessments can be the other effective alternative to validate the LDA results. Secondly, there could be some emerging frontiers of BIM research in the patent pool. However, it is very likely that these emerging and cutting-edge patents only stand for a friction of the total population, which might potentially lead to misidentification at current state. Therefore, a continuous monitoring and updating work would be beneficial in the foreseeable future.

Conclusions
Building Information Modeling (BIM) has great potential for increase project productivity and performance. Awareness of BIM application hotspots and forecasting its trends can significantly drive innovations in the construction industry. In this study, patent information extracted from the Derwent Innovations Index (DII) database is a proactive source for analyzing the development trend and application hotspots in the BIM field. Such patent information can provide practitioners with important sources of information that can be used to retrospectively inform BIM services and promote technology innovation in projects.
This study aims to identify the application hotspots of BIM and forecast its trends in an effective framework integrating citation network analysis and LDA-based topic clustering. This framework comprises three-step analysis: (1) quantitative characteristic analysis of patent outputs to provide a visual overview of BIM patents; (2) the production of a Social Network Analysis (SNA)-based co-occurrence network analysis to determine the citation interrelations between BIM patents and (3) the identification of BIM patent topics using a Latent Dirichlet Allocation (LDA) to further analysis the topic characters of critical patents. For this study, the 1510 patents gathered over sixteen years (2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020) are analyzed.
In this study, the characteristics of BIM patents outputs show that there has been a significant increase in the number of patent applications since 2010 and that BIM serves can be as a more and more important and useful tool in facilitating the construction for its benefits of powerful management of physical and functional digital presentations. The patented current version of International Patent Classification (IPC) is mainly included in the G06F (electrical digital data processing) classification meaning the application hotspots of G06F is the core technology. The "US20150248503A1", the patent code of "a method for creating a three-dimensional (3D) model of two-dimensional (2D) data of building information modeling (BIM) and the system", has the highest degree of citation in the citation network, considered as the application hotspots in the BIM field. Besides, the co-occurrence citation interrelations of the important patents (such as the "US20120310906A1" and "US20150248503A1") are identified. The results indicate that the SNA-based network analysis can be effectively used to identify important patent codes relating to BIM (such as: "US20150248503A1" and "US20150057981A1"), as well as the co-occurrence citation interrelations of the important patents. For indepth analyzing of BIM application hotspots, ten topics ("Power supply module based on BIM technology", "Steel structure analyzing and detecting management system", "Energy production apparatus management system" and others) are identified based on the LDA clustering model. These topics and correspondent keywords represent the current trends of the BIM field. For example, integrating BIM technology with other technologies will promote the development of the construction industry. Blockchain provides a method for creating solutions to the long-standing issue of lack of incomplete, inaccurate information sources when using BIM in the construction field. Besides, a good clustering approach of patent map demonstrates the effectiveness of the developed LDA model.
To the end, the contributions of this study are threefold: (1) an innovative text mining-based framework for BIM patent analysis in construction is developed; (2) patents that have focused on identifying the application hotspots and development trend of BIM in accordance with our developed framework are reviewed; and (3) a signpost for technological development and innovation of BIM is provided.