Font Size: a A A

Research On Key Technologies Of Open Domain Meta-event Extraction

Posted on:2021-05-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Z GaoFull Text:PDF
GTID:1368330623982217Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Event extraction is one of the hot topics of natural language processing.It's widely used in public opinion monitoring,emergency alarm,intelligence gathering,etc.Event extraction can be divided into two categories: meta-event extraction and topic-event extraction.This dissertation focuses on the research of meta-event extraction.The existing methods are mostly domain-oriented,which can be hardly applied to deal with open domain corpora.To solve the problem,this dissertation studies some key technologies of open domain meta-event extraction and pays much attention to the following questions:1.Word embedding.Word embedding is a basic tool of event extraction.It's widely used in all respects of event extraction tasks.What's more,the techniques of word embedding can also be used to calculate event embeddings.2.Open domain meta-event embedding.Event representation is a premise of event extraction.It's necessary to represent events as continuous vectors before event identification and argument extraction.3.Construction of open domain meta-event templates.Event extraction needs to be conducted under the supervision of event templates.The existing sets of event templates are too small to be used in open domain environment.We suggest to construct meta-event templates based on FrameNet.There are two main problems in mapping frames to event templates: frame identification and frame-semantic role labeling of event sentences.This dissertation explores the above questions in depth,and the main works and creations are as the following:1.To overcome the problem that neural language models may be difficult to train and time consuming,a word embedding method based on Zipf's co-occurrence matrix factorization is proposed and implemented.The method drastically reduces the matrix dimensions according to the famous Zipf's word frequency law,which facilitates the matrix storage and calculation.The method also simplifies the statistics and transformation of co-occurrence matrix and thus reduces the time cost.We use Randomized SVD to factorize the built matrix so as to reduce the computing overhead.As SVD cannot capture the non-linearly relations of features,an autoencoder is constructed to further transform the vectors non-linearly.The method is compared with some well-known neural language models such as Word2 vec,Glove and Fasttext and shows a comparable performance.Our method costs much shorter time than Word2 vec models.2.To alleviate the difficulty of meta-event representation in large-scale open domain event extraction,a method for meta-event embedding based on Zipf's co-occurrence matrixfactorization is proposed.To solve the problem that traditional methods take whole sentences as event tags and are easily to result in “curse of dimensionality”,the method extracts event tuples from large-scale open domain corpora and then proceeds with tuple abstraction,pruning and disambiguation to obtain typed event tags.To avoid the too fine encoding of traditional unsupervised models,the method use Zipf's co-occurrence matrix factorization to calculate event embeddings globally.The generated vectors are tested on the task of nearest neighbors and event identification.The experimental results prove that our method can capture the information of event similarity and relativity globally and avoids the semantic deviation caused by the too fine granularity of encoding.3.To overcome the problem that the traditional methods of frame identification only consider the context of lexical units and are difficult to further improve the performance of frame identification,a novel method which also considers lexical units definitions is proposed.This dissertation constructs three models for frame identification based on the BERT pre-trained network and compares these models with traditional ones on the frame identification task.The experimental results show that the BERT-based models significantly outperform the traditional models and the addition of lexical units definitions can effectively improve the model performance,which proves the effectiveness of our method.4.To solve the problem that too many frame elements may deteriorate the performance of frame-semantic role labeling,the global semantic roles of FrameNet are defined and the mappings between frame elements and global semantic roles are created.To overcome the problem that the BERT model cannot capture the information of lexical units and frame types,this dissertation constructs a sequence labeling model which adds a bidirectional LSTM layer and a CRF layer on the basis of BERT pre-trained network.The model takes into consideration of contexts,lexical units and frame types and outperforms the control models,which proves the effectiveness of our method.
Keywords/Search Tags:Meta-event Extraction, Co-occurrence Matrix Factorization, Word Embedding, Event Embedding, Pre-trained Model, Event Templates
PDF Full Text Request
Related items