Font Size: a A A

Research On Joint Extraction Method And Application Of Document-level Events For Chinese Text

Posted on:2022-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:X L WangFull Text:PDF
GTID:2518306575966449Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Event extraction is an important research direction of knowledge mapping.However,with the increase of sentence length,especially for the document level event extraction task,the traditional event extraction algorithm faces three problems: one is the matching of event roles and event types across sentences;the other is that there are too many event types in the document level data,and they are scattered in multiple categories;The third problem is the lack of Chinese annotation data.Sequence annotation can extract semantic information on the basis of preserving event sequence information,while Attention can capture global information while paying attention to local information,which overcomes the problem of obtaining long-distance dependence in sequence model.Moreover,Attention discards sequence information,and each step is one,which can be operated in parallel and reduce the training time of model.Therefore,this thesis proposes a new event joint extraction algorithm based on the sequence annotation theory,which takes the event type classification as a sequence annotation task,keeps the sequence information,avoids the error of event type and event argument matching,and combines the Attention mechanism to effectively extract the semantic information in the long text,fully considers the similarity and difference between multiple sentences It can effectively solve the problem of event argument dispersion in document level event extraction task.The main research work of this thesis is described.1.Aiming at the problem of dispersing sentence arguments,combined with the idea of sequence annotation,a joint event extraction method based on sequence annotation is proposed.Firstly,combined with CNN and BILSTM,the global and local features are extracted to extract multiple event types in a single document;secondly,sequence annotation is introduced to complete the task of event extraction,and the automatic matching between event types and event arguments is realized;then,LSTM is used as the shallow parameter sharing layer network,and combined with Self-Attention to extract individual task features to complete the task of event joint extraction Finally,the CRF layer decodes the tag constraint to get the final tag sequence.The experimental results show that this method is superior to the literature method,and can effectively extract the event information in the document,and the model is applied to the dispute focus recognition in the judicial field.2.In order to solve the problem of sparse event information,a joint event annotation algorithm is proposed.Firstly,the model is divided into two parts,one part is Attention to extract semantic information at word level,the other part is sentence level;then,sequence annotation and joint extraction algorithm are integrated;finally,event type and event argument are extracted jointly.Experimental results show that the method can effectively extract event information from document data.3.In view of the lack of Chinese event tagging corpus and the lack of practical application field,this thesis constructs the judicial domain tagging corpus and document level event extraction demonstration system.First,download the judicial data on the Internet,and clean the data according to the characteristics of the data;second,use brat online marking system to mark the data;finally,based on this algorithm,establish a document level event extraction system,through the event extraction results and event map display,make the judicial event record more systematic.
Keywords/Search Tags:sequence annotation, joint extraction, event extraction, attention
PDF Full Text Request
Related items