Font Size: a A A

Research On Event Extraction Methods Via Multi-level Representation

Posted on:2019-05-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y X QinFull Text:PDF
GTID:1368330590972806Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Under current informative era,information extraction techniques help people obtaining and understanding information from a huge scale of data.Event extraction is an important sub-task in information extraction,which aims to extract structured events from unstructured text.An event representation method means the way to represent events,for example,collections of words or predefined event templates.Event representation methods influence event understandability in the way that whether extracted events could provide complete event information for users.This article mainly investigates multilevel event representation methods for event understandability,including phrase,clause,document and multi-document level event representation methods.(1)This thesis proposed a feature-rich classification filtering based event extraction method.Segment-based event extraction method is one of the mainstream methods.Segments,most of which are named entities and commonly used phrases,are obtained by splitting social media texts.Segment-based event extraction method is same efficient as word-based method,yet has better event understandability.Aiming to distinguish news events from meaningless topics in candidate events,this thesis proposed to use a feature-rich classification-based news event filter instead of existing statistics-based ranking method by newsworthiness.Shortages of statistics-based method are considering only a few features and sacrificing recall for high precision.This thesis investigated how can documents related to candidate events help news event filtering.This thesis defined a series of features to model candidate events' statistical information,social information and textual information.Experimental results show that classification-based news event filtering method significantly improved recall on the basis of high precision compared to the statistics-based method.(2)This thesis proposed a frame-based event extraction method on social media,in which a frame consist a subject,a verb and an object.Events represented by a collection of words/segments are flatten representations,as event representation units are independent to each other without showing structured information within units.This thesis proposed to use clause-level event representation units(frames)for event extraction.Frames are defined as triplets containing subject,verb and object phrases.Structured information within phrases in frames represents deep semantic information of clauses,which is helpful to understand events.this thesis conducted shallow semantic analysis on social media documents and extract frames via open information extraction methods.Experimental results show that proposed frame-based event extraction method improved precision and obtain better event understandability.(3)This thesis proposed a document-level temporal feature based event extraction method.Events represented by documents contain complete event information and are easier to understand than words/phrases.However,the huge volume of social media data and high space cost of bag-of-words based document representation limit the development of document-based event extraction methods.Inspired by word embedding representation technique,this thesis use low-dimensional dense vector for document representation to reduce time and space cost of event extraction.Due to sparseness of documents,current document-based event extraction methods could not calculate document-level temporal features for distinguishing news events from meaningless topics.This thesis proposed to extent word-level statistics-based temporal feature to document-level temporal feature for news event filtering.This thesis defined r-radius neighbors of a document as its semantically similar documents,and counted as the document's semantic frequency,which alleviates the sparseness of documents.This thesis calculates document-level temporal features via documents' semantic frequencies,and use it for news event filtering together with other statistical features.Experimental results show that the proposed document-level temporal feature improves the precision of document-level event extraction method.(4)This thesis proposed a hybrid representation model based Chinese neural event extraction method.This thesis use templates to represent event,while template elements can be extracted within multiple documents.Template-based event representation not only contains complete event information,but also in a more concise way.Due to differences between Chinese and English,applying existing English neural network methods to Chinese event extraction task yields bad performance.This thesis proposed a neural network-based Chinese event extraction model to solve the feature engineering problem and the out-of-vocabulary problem in Chinese event extraction.First,this thesis used two recurrent neural networks to learn word-level representation and character-level representation for words,respectively,which are concatenated to form hybrid representation for words.Hybrid representation of words could alleviate out-of-vocabulary words' representation problem in Chinese data set.Second,this thesis utilized convolutional neural networks to learn chunk-level features with respect to current trigger-argument pair for argument role classification task.Last,this thesis jointly learn event detection and argument role classification tasks via shared parameters to reduce propagated errors.Experimental results show that the proposed hybrid-representation based Chinese event extraction model significantly improved the precision.
Keywords/Search Tags:Event extraction, social network, frame, event representation, character information
PDF Full Text Request
Related items