Font Size: a A A

Chinese Document-level Event Extraction Based On Deep Learning

Posted on:2021-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2518306107968859Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With text information increasing explosively in cyberspace,event extraction(EE),as a research direction in the field of artificial intelligence,has begun to receive more attention.By applying computers to extract the time,addresses,objects,actions and other structured essential information in an event from the unstructured text,the content of the event that a user is interested in can be expressed in a more concise and refined way,thereby the efficiency of humans obtaining valuable content from massive texts can be improved greatly.Although the research on EE has made tremendous progress in recent years,most existing studies focus on sentence-level EE,that is to say,they assumed that all elements of an event appear in a single sentence.This is inconsistent with the practical situation that requires combining multiple sentences in the context in order to extract a complete event.On the other hand,the training results of deep learning EE models depend on both the scale and the annotating quality of the training dataset,but it's difficult to obtain a dataset with manual annotations for extraction of domain specific events.In order to solve the problem of incomplete information in sentence-level EE,a Chinese document-level EE model ATTDEE(ATTention based Document-level Event Extraction)based on self-attention mechanism is proposed.By jointly training the entity recognition module with multi-head self-attention mechanism,the event type detection module and the document-level argument extraction module,ATTDEE can not only reduce the error propagation in traditional pipeline EE methods,but also promote the granularity of EE to the document level.In addition,ATTDEE does not rely on any explicit event triggers during the stage of event type detection,which not only solves the problem that event triggers may not exist in actual application scenarios,but also avoids the workload of labeling event triggers in datasets.Comparative experiments on a public financial EE dataset demonstrate that ATTDEE can effectively solve document-level EE tasks.To alleviate the difficulty of labeling EE datasets for specific domains,a method to scale up datasets using pre-trained language models is proposed.By fine-tuning on massive unsupervised financial announcements,BERT(Bidirectional Encoder Representation from Transformers)model can better learn the specifications of financial announcements.Then,for each document in the existing small-scale financial EE dataset,the method replaces arguments and entities with words of the same type,applys fine-tuned BERT to rewrite connective words except arguments and entities in each sentence,and finally generates a new training data.In order to enhance the effect in Chinese corpus,this method proposes to fine-tune BERT and to rewrite connective words in units of phrases.By comparing the training effects of ATTDEE model before and after the dataset expansion,it is proved that for domain specific EE tasks the method can quickly generate high-quality annotated datasets,enhance data generalization,and improve the training effects of EE models on small-scale datasets.
Keywords/Search Tags:Chinese document-level event extraction, event type detection, event argument extraction, dataset expansion
PDF Full Text Request
Related items