Font Size: a A A

Research On The Topic Discovery And Summarization Methods Of Chinese And Vietnamese Bilingual News

Posted on:2019-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y S WangFull Text:PDF
GTID:2438330563457671Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the advancement of the " The Belt and Road",Vietnam has become more and more closely connected with China,and more and more news events happened between two countries.After the news events happened,a large number of reports were made by Chinese and Vietnamese media,and he reports were written in Chinese or Vietnamese.It is necessary to understand the hot news topics and their main information timely and effectively.However,the amount of information on the Internet is huge.Chinese and Vietnamese are different languages and it is difficult to effectively analyze through human reading.This paper focuses on the task that detect hot news topics from bilingual news texts and generate a concise text to cover the main idea of the news topics automatically.This article mainly completes the following characteristic research work:(1)Proposing Chinese-Vietnamese bilingual news topic detection method based on graph clustering,which aims to automatically divide Chinese texts or Vietnamese texts into different classes depend on different news events.As we know,it is difficult to represent different language texts in the same feature space,so mining the same topic between different language texts is a hard task.Considering that the news elements of specific news topic texts are consistent no matter what language the texts written in,the correlation between these elements can reflect the relevance of Chinese texts or Vietnamese texts.Therefore,we firstly extract news elements,and represent texts as vector according to these news elements.Then,we calculate the similarity of text vectors based on Wikipedia bilingual knowledge,and map these texts and relationships into graph model.Considering the news texts propagation characteristics,random walk algorithm is applied to optimize similarity relationship.Finally,Finally,affinity propagation algorithm is used to cluster texts into different classes,and Each class represents a topic.The method has achieved good results.(2)Proposing Chinese-Vietnamese bilingual news event summarization method based on graph ranking.Multi-language news event summarization aims to obtain the important information from lots of related news texts written in different languages automatically.Considering that the main information about the same event is similar no matter what language it presented,the paper proposes a novel unified approach to summarize important information from the monolingual and Chinese-Vietnamese news document sets simultaneously.Firstly,analyzing the sentence dependence relationship,making rules to segment sentence into different grammatical parts and using dictionary to set up bilingual feature space.Secondly,calculating Chinese-Vietnamese sentence graph model.Finally,using the feature that graph nodes can boost each other and fusing context information,the sentences are ranked based on whether they can represent the important information.The result shows that our method is effective.(3)Build a Chinese-Vietnamese news topic detection and summarization prototype system.Collecting Chinese and Vietnamese news data automatically and in real time from the Internet.Integrating the topic detection and summarization algorithms proposed in this paper and analyzing automatically.Finally,presenting the results to users.
Keywords/Search Tags:Chinese-Vietnamese bilingual, news topic detection, topic summarization, graph clustering, graph ranking
PDF Full Text Request
Related items