Font Size: a A A

A Study On The Analytical Method Of Chinese And Vietnamese Bilingual News

Posted on:2016-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:W X LongFull Text:PDF
GTID:2208330470470763Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Vietnam and China are closely connected, so the topic detection and storyline generation from information in the mass of the Chinese-Vietnamese news are of very important significance for the two countries to strengthen economic and political cultural contact and enhance exchanges between the people of two countries. Getting the news topic from dynamic data and the event storyline is one of the hot topics in the field of information retrieval at present, which is comprehensive application of basic research in information retrieval. The news text, no matter what kind of language, contains a specific event elements, such as time, people, place, by which we can set a symbol and use as supervision information to improve the accuracy of analysis of news. Based on this, this paper concentrated on dynamic Chinese-Vietnamese bilingual news, and put the event elements into bilingual topic models to discover and track the news topic, then generated the sub-topic in process of the event as the storyline. Specifically following aspects were completed in the work:1. A model was proposed which was online and combined with event RCRP algorithm based on the Chinese-Vietnamese bilingual topic model. Firstly event elements including time, place and person were extracted from the new bilingual news, and then the extracted event elements and bilingual aligned words were combined with RCRP algorithm to construct the online Chinese-Vietnamese bilingual topic detection model. Finally the proposed model was used to determine new news a new category or the existing categories. The experiments were done on Chinese-Vietnamese mixed news set crawled from internet, and the proposed model was compared with standard LDA and K-means clustering algorithm, which proved the propose bilingual topic detection model got better results.2. A model was proposed based on the global/local co-occurrence word pairs and used to generate a storyline. The detected news topic word distribution from the last chapter was used as global words to characterize global event, and then time, person, place and other event elements in the news segment divided by certain time granularity were used as the local words. Then regularities of co-occurrence of global and local words were used as supervised information to combine bilingual aligned words and RCRP algorithm which were finally put into the topic model, and sub-topics under corresponding time granularity were gotten to represent the process of the development of the event, which meant event developments were reflected by sub-topic distribution. The experiments were done on Chinese-Vietnamese mixed news set crawled from internet, and the proposed model was compared with LexPageRank and Chieu clustering algorithm, which proved the propose bilingual topic detection model got better results.
Keywords/Search Tags:Elements of news, Topic detection, Recurent Chinese Restaurant Process algorithm, RCRP, Storyline
PDF Full Text Request
Related items