Font Size: a A A

Research On The Construction Of Hanyu Bilingual Corpus And The Method Of Extracting Event Graphs

Posted on:2018-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2358330518961951Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Event extraction is one of the most important tasks in information extraction.and its main goal is to extract the events contained in texts.In particular,the information extraction of Vietnamese news has an important role in dealing with the international relations with Vietnam and the regional economic development and political stability of Vietnam.In general,a news is composed of multiple events in the news text.In the process of obtaining information from the news,people also need to obtain a number of sub events,in addition to obtaining the relationship between these events.These relationships also are the important information of news.Therefore,it is very important to obtain events and the relationship between events by event extraction.This paper aims at the problem of Bilingual News Event Extraction,and makes an in-depth study on the construction of Chinese-Vietnamese bilingual news corpus,the Chinese and Vietnamese news events extraction,and the construction of bilingual event graphs,and Completed the following research work:(1)The Chinese and Vietnamese bilingual news corpus is constructed.According to the analysis of the Vietnamese news and the needs of event extraction,corpus contents are define,including events,event elements,event time relation,event coreference relation and aligned relationship of cross language events.508 Chinese-Vietnamese Bilingual News are collected,which are annotated by XML.It provides an important support for the future Chinese-Vietnamese bilingual event extraction and the construction of bilingual event graph.(2)An event extraction method based on machine learning and rules is implemented.First of all,the word and the part of speech,the context of the word and part of speech,and semantic features are chose as features,and the Chinese event recognition results is chose as a guiding feature into the Vietnamese event recognition.Support vector machine(SVM)is used to train event recognition model to identify event trigger words.Then,according to the rules of syntax and grammar of Chinese and Vietnamese,the event element extraction rules are defined,event elements are extracted based on rule matching.Finally,we define the event element type digestion rule,and realize the event element type resolution by rule matching.For event elements that do not conform to the event element type digestion rules,the event element types are extracted by calculating the similarity of the word sense sets.The experimental results show that the proposed method can improve the effect of Vietnamese event extraction.(3)This paper proposes a method for constructing Chinese-Vietnamese bilingual event graphs based on the event and the relationship between events.Firstly,using the support vector machine(SVM)model to extract the coreferential relation and time relationship between events.Then,the bilingual event graph is constructed by taking the event as nodes and the relationship between events as edges.Finally,the PageRank algorithm is used to solve the weights of the nodes in the directed graph,and the Chinese and Vietnamese bilingual events are sorted.(4)Based on the above research results,a prototype system of Bilingual News Event Graph Extraction is designed.
Keywords/Search Tags:Vietnamese, event extraction, event element extraction, relation extraction, event graph
PDF Full Text Request
Related items