Font Size: a A A

Research On Chinese-oriented Text Event Extraction

Posted on:2020-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2428330596468999Subject:Public Security Technology
Abstract/Summary:PDF Full Text Request
Event extraction is one of the research hotspots in the field of natural language processing.By constructing the structured information content from unstructured event texts,the efficiency of information storage and retrieval will be improved.There are two subtasks in event extraction: event identification and argument extraction.At the same time,the topic classification of events has a positive effect on improving event extraction efficiency and reducing workload.This thesis focuses on the above three aspects of the task,launched a related study,the specific content is as follows:1.For the event identification research of Chinese text,a Chinese event identification method based on BTB-LSTM network is proposed,which allow the latest NLP model BERT to be applied to the field of event identification.In this thesis,the part-of-speech features,named entity features and semantic dependence features are extracted in the Chinese text.Combined with the BERT vector,the multi-layer Bi-LSTM network structure is used to extract the candidate event sentence features,and the CRF model,instead of softmax classifier that is used in many cases,is used to obtain the event trigger words.In turn,the identification of the event is achieved.2.For the argument extraction research for Chinese text,an event element extraction method based on conditional random field and a time-location element filling method based on rules is proposed.This thesis adopts the idea of sequence labeling,extracts features from the part of speech,entity component,grammatical and semantic association between nodes and triggers,and uses Support Vector Machine(CRF)to train and learn the features of every argument.Finally,we use the trained model to identify and extract event argument from new event sentences.At the same time,based on the statistical analysis of the missing time and place elements in the event,considering the event relationship and the co-occurrence of context event elements,the filling rules are proposed and fill in the missing elements.3.For the event topic classification research for Chinese text,an event topic classification method based on WMF_LDA topic model is proposed.This article takes the idea of a topic model.Based on the original LDA theme model,the semantic aggregation algorithm of "small step and multiple rounds" is proposed to form the WMF_LDA theme model,so that the topic distribution vector of the event text can better reflect the theme content of the event.And at the same time,Random Forest algorithm is used to train and learn the topic features of Chinese texts,which will better guide the classification of event topics.In this thesis,CEC corpus is used for the experimental verification of event identification,argument extraction and event topic classification.Experiments show that the F values of the proposed and adopted methods in the above three aspects can reach 71.3%,76.0% and 94%,all showing good results.
Keywords/Search Tags:Chinese event extraction, Event identification, Event argument extraction, Event topic classification
PDF Full Text Request
Related items