| The accidents in construction sites are largely caused by the unsafe behaviors of people,which is a chronic disease that causes the construction safety problems.It is of great significance to identify and reduce unsafe behavior to ensure the safety of on-site construction.Therefore,this thesis combined computer vision and natural language processing to explore the identification method of workers’ unsafe behavior in construction by multi-mode fusion.The mainly work can be summarized as follows:Firstly,the unsafe behavior of construction workers and related theories were elucidated,in being methods of unsafe behavior identification were interpreted and application effect and existing problems of various methods for workers’ unsafe behavior identification were analyzed to concluding the necessity of conducting the research of identification method of workers’ unsafe behavior in construction by multi-mode fusion.The present studies of workers’ unsafe behavior identification were mainly based on computer vision and deep learning,which have obvious problems,such as weak generalization ability,lack of training data,single data mode,poor versatility,relying on tedious manual operation and so on.Secondly,a multimodal fusion method for identifying workers’ unsafe behavior in construction site was proposed,matching with the text of unsafe behavior lists to realize the automatic identification of variety kinds of unsafe behavior.This method was realized in 3 steps:(1)Combining computer vision and deep learning to establish a target detection model based on the bottom-up attention mechanism of Faster R-CNN,which is used to extract and represent the image features of workers’ unsafe behavior automatically.(2)Utilizing natural language processing and deep learning methods to extract and represent security rule text characteristics automatically.(3)Using multimodal fusion method of stacked cross attention(SCA)to realize multimodal fusion and similarity calculation for extracted text features and image features.Finally,the effectiveness and feasibility of the multi-mode fusion method was verified by experiment.The results shown that the multimodal fusion method proposed in this paper can automatically the extract semantic information of image and match the corresponding entries of safety rules text,and simultaneously identify the unsafe behaviors in images of construction site,and the operation process can be automated.This research has put forward a multi-modal fusion method to identify the workers’ unsafe behavior in construction,realized automatic identification of variety kinds of workers’ unsafe behavior in construction site,which promoted the development of automatic and continuous identification of workers’ unsafe behavior in construction under generic scenarios and has positive significance for realization of intelligent construction sites under digital construction mode in China. |