Font Size: a A A

Research On Entity Relation Extraction Of Literature In The Field Of Third-generation Semiconductor Materials

Posted on:2022-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:B R YangFull Text:PDF
GTID:2518306569979319Subject:IC Engineering
Abstract/Summary:PDF Full Text Request
The third generation semiconductor materials have important application value in the fields of optoelectronics and microelectronics because of their excellent properties.The literature of semiconductor materials is the main source of knowledge in this field,and the entity relation in the literature is an important form of knowledge in this field.With the rapid development of the third generation semiconductor materials in recent years,there are nearly1000 articles in related fields on IEEE alone every month.Too much literature growth makes it difficult for researchers to timely and accurately obtain the latest research trends or results in the field.In order to enable machine-aided processing of a large number of domain literature,researchers use natural language processing technology to help extract key information from the literature.However,the relational extraction model of literature text in general domain or biomedical field is not suitable in the field of third generation semiconductor materials.Therefore,this paper focuses on the entity relation extraction of the English literature in the third generation semiconductor material field.Aiming at the characteristics of language description of the literature in this field and the shortcomings of the existing methods,this paper focuses on the integration of pre-training model and deep learning network to extract materials,devices,methods and other entities from the literature in this field and the relationship between them.The specific work is as follows:1.In view of the lack of English literature dataset in the third generation semiconductor material field at present,this paper collects a large number of English literature in this field,defines entities and relationship types,and manually tags these literature.after tagging,manual verification is carried out,and finally two datasets are generated,which are used for named entity recognition and entity relation extraction tasks.2.According to the characteristics of entity recognition in the literature of the third generation semiconductor materials,a dynamic fusion BERT-Bi LSTM-CRF model is proposed,which integrates the strong feature extraction ability of BERT and the advantages of Bi LSTM+CRF structure,and solves the problems of long-distance correlation and context dependence in domain literature entity recognition task.Besides,the BERT model is improved.The output of each transformer encoder is weighted and fused by assigning dynamic weight to each output,so as to get a vector representation with richer semantic information,which further improves the performance of the model.The experimental results on the Co NLL-2003 dataset and the dataset constructed in this paper show that the model achieves better results than the mainstream methods.3.According to the characteristics of entity relations in the literature of the third generation semiconductor materials,an EI-BERT-CNN model integrating BERT and CNN is proposed in this paper.The model can use BERT to generate word vectors containing global semantic information,and use CNN network to extract sufficient local features of sentences.The model solves the problems of long-distance dependence and local correlation in domain literature relation extraction.In addition,the model also integrates the information of the entity phrase itself through the entity information module,which further improves the performance of the model.The experimental results on the Sem Eval2010?task8 dataset and the dataset constructed in this paper show that the model achieves better results than the mainstream methods.Finally,a knowledge extraction system for the third generation semiconductor material literature is implemented based on the dynamic fusion of BERT-Bi LSTM-CRF model and EI-BERT-CNN model.
Keywords/Search Tags:literature in the field of third-generation semiconductor materials, BERT, deep learning, entity recognition, relation extraction
PDF Full Text Request
Related items