Font Size: a A A

Research On Entity Relation Extraction Of Aluminum-silicon Alloy Based On Text Mining

Posted on:2022-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:C H YaoFull Text:PDF
GTID:2518306524952379Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The data-driven model is the core issue of the Material Genome Initiative(MGI),but how to quickly obtain a large amount of material data has become a critical issue that needs to be resolved.At present,the sharing of material databases is low,so it is not easy to obtain useful material data from public resources.Therefore,this paper uses the method of text mining to obtain valid data from the Al-Si alloy material literature.Natural language processing(NLP)is a commonly used text mining method.Relation extraction(RE),as one of the main tasks of natural language processing,can effectively extract information from the literature.This paper uses the Al-Si alloy entity relationship joint extraction model to combine entity recognition and relationship extraction tasks.The problems of error accumulation and relationship overlap are avoided.In this paper,the multi-head selection model for joint entity and relation extraction is used as the basic model,and the research on the joint entity relationship extraction model of Al-Si alloy is carried out.By analyzing the existing problems and the deficiencies of the existing models,specific solutions and improvement methods are proposed.The main work includes the following aspects:(1)Since there is no public data set suitable for the research work of material relationship extraction in the current materials field,this paper constructs an Al-Si alloy relationship extraction data set.According to the established Al-Si alloy relationship extraction data set construction standard,the material data is collected from the literature of the Al-Si alloy spray deposition experiment.The Al-Si alloy relationship extraction data set was constructed by manual annotation,including 13 entity types and4 relationship types,with a total of 2246 sentences,2522 entity examples and 1510 relationship examples.(2)Aiming at the problem that the embedding layer of the basic model cannot express polysemous words and unregistered words,this paper proposes a joint extraction model of Al-Si alloy entity relations based on dynamic word embedding.The pre-trained ELMo model is used to dynamically obtain word embeddings to better express the complex semantic and grammatical information in the scientific literature of materials.Through the experimental comparison of different ways of applying the ELMo model to downstream tasks,it is verified that the pre-trained ELMo model can be better applied to the material field with fewer data sets.The obtained word vector has a high accuracy rate,which improves the overall performance of the joint model.(3)Aiming at the problem of information loss in the coding layer of the basic model,this paper proposes a joint extraction model of Al-Si alloy entity relationships based on the self-attention mechanism.The coding layer is improved,and the selfattention mechanism is added to the Bi LSTM layer of the basic model.Avoid the problem of the former part of the sentence being diluted or covered by the latter part of the information.This enables the Al-Si alloy entity relationship joint extraction model to better capture the dependence of sentences in materials science literature.Through experimental comparison,it is verified that the performance of the joint extraction model of Al-Si alloy entity relationship based on the self-attention mechanism has been effectively improved...
Keywords/Search Tags:text mining, entity relationship joint extraction, Al-Si alloy entity relationship extraction dataset, ELMo model, self-attention mechanism
PDF Full Text Request
Related items