Font Size: a A A

A Study On The Method Of Extracting Elements From Vietnamese News Events

Posted on:2015-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q PanFull Text:PDF
GTID:2208330431978202Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advancement of globalization, China and ASEAN have become more frequent communication. Vietnam that borders China, exchanges with China in the political, economic, cultural and other aspects that is growing. In order to better understand the Vietnamese view event to China in important areas of politics, economy and so on, it has very important research significance to collect the Vietnamese domestic news corpus and to manage it on the basis of corpus manage system, then analyze and extract the important event element. This paper focuses on the issue of Vietnamese news event element extraction. The relevant research of the Vietnamese word segmentation, part-of-speech tagging of Vietnamese, Vietnamese news event corpus construction, Vietnamese News named entity recognition method, and the Vietnamese news event element extraction method based on template and the maximum entropy model are carried out. The main research work have completed as follows:(1) A general Vietnamese word segmentation and part-of-speech tagging system platform are developed. In view of the existing Vietnamese word segmentation and part-of-speech tagging toolkit providing core API and model, this paper integrates development of a general Vietnamese word segmentation and part-of-speech tagging platform, which are the foundation for the subsequent Vietnamese news event element extraction.(2) The Vietnamese news event corpus is constructed. First, the Vietnamese news events are defined and the website of news corpus source of Vietnamese are selected. Then we launch a series of marked work for the website of news corpus sources Vietnamese, such as, type, category, word segmentation and part-of-speech tagging, news entity tagging, trigger word and event elements tagging. Finally, we store the analysis of corpora and construct the Vietnamese news event corpus.(3) A Vietnamese named entity recognition method based on CRF is proposed. According to the characteristics of the Vietnamese words and part of speech, this method defines the feature template of Vietnamese named entity recognition, and uses the collected corpus to Vietnamese news events to mark the Vietnamese names, place name, organization names, percentage, currency, money amount, date time and so on, and trains the model of Vietnamese named entity by the CRF, and uses the model to realize the Vietnamese named entity recognition finally.(4) A Vietnamese news event element extraction method based on template and the maximum entropy model is proposed. Firstly, this method analyses the characteristics of Vietnamese news events, then recognizes the type and category of the Vietnamese news events, and then defines the extraction template of event, and then combines with the maximum entropy model to extract Vietnamese element of news events.(5) Based on the researches above, a prototype system of Vietnam news event element extraction is designed and achieved.
Keywords/Search Tags:Vietnamese, word segmentation and POS tagging, corpus construction, named entity recognition, event element extraction
PDF Full Text Request
Related items