Font Size: a A A

Research And Application Of Iterative Modeling For Domain Entity And Relation Extraction

Posted on:2022-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhaoFull Text:PDF
GTID:2518306764476094Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Domain entity and relation extraction is always faced with the problem of insufficient corpus.Corpus annotation is a very time-consuming and tedious work.It will cost a lot of labor to use manual annotation method fully.However,the traditional method of expanding corpus based on semi-supervised or remote supervision has the problem of wrong labeling,and the quality of the obtained corpus is generally low.At present,the research work of entity and relation extraction in various fields has its own situation,and there is no general modeling method.Aiming at the development demand of domain entity and relation extraction,a general modeling method of domain entity and relation extraction in order to alleviate the problem of insufficient domain annotated corpus is proposed in thesis.And a joint entity and relation extraction model is also proposed.A joint entity and relation extraction model based on full attention mechanism is proposed.Attention mechanism is a hot research topic in the field of artificial intelligence at present.Many models have been greatly improved because of the addition of attention mechanism.The existing joint extraction models of entity and relation generally introduce attention mechanism only when base word embedding,but ignore attention mechanism when relation representation.The model in thesis improves the existing Sci IE(Scientific Information Extractor)model by incorporating attention mechanism into basic word embedding,entity embedding and relation embedding.For Sci ERC and Co NLL04 data sets,the proposed model achieve F1 scores of 68.4% and 88.2% in the entity extraction task,4.9% and 0.5% higher than the control method,and 47.1% and 69.9% in the relation extraction task,13.2% and 1.5% higher than the control method.A corpus construction method for domain entity and relation extraction based on manual intervention and iterative modeling is proposed.The basic model is trained by using a small amount of annotated corpus,and then new samples are generated based on model prediction.The model prediction is modified based on human intervention,and the model effect and corpus scale are continuously improved through iterative modeling.In thesis,four kinds of iterative modeling modes are summarized,which are incremental hot training,incremental cold training,moving hot training and moving cold training,and the experiments of simulated iterative modeling are carried out respectively,and the differences of the effects of the four models are compared,and the modeling effects are quantitatively analyzed.On the simulated data,the iterative modeling method in thesis saves up to 39% of the labor cost compared with the pure manual annotation method.Thesis use the data collected from the network to carry out the practical application of iterative modeling,and finally obtain a relatively small scale corpus of domain entity and relation,which includes 18395 entities and 4802 pairs of relations.In addition,in order to facilitate the operation of iterative modeling,thesis develope a set of entity and relation iterative modeling system.The system provides label design,model management,corpus annotation and other functions.The label design function is convenient for users to design entity and relation labels,the model management function enables users to create and train entity and relation extraction model,and the corpus annotation function provides visual entity and relation annotation.
Keywords/Search Tags:Entity and Relation Extraction, Insufficient Corpus, Attention Mechanism, Iterative Modeling, Manual Intervention
PDF Full Text Request
Related items