Font Size: a A A

Research On Diversity Summarization Method Of Chinese And Vietnamese Bilingual News

Posted on:2019-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:L YeFull Text:PDF
GTID:2438330566983709Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
The relations between China and Vietnam have become increasingly closer with the proposal of the Belt and Road Initiative.With regard to some important news events,the media of both countries will publish a large number of Chinese news and Vietnamese news.It is of great significance to timely and effectively obtain the main content of these bilingual news and the differences between them.This thesis studies the comparative summarization of Chinese and Vietnamese bilingual news and mainly completes the following research work.Learning Chinese and Vietnamese bilingual word embeddings based on Wikipedia.Using Wikipedia corpus for training to produce two monolingual word embeddings which can capture semantic information very well,and then these monolingual word embeddings are projected into a same vector space by using an existing method.The projected word embeddings can be used to calculate the correlation between Chinese and Vietnamese words and can be used as a resource for bilingual text analysis.Experiment shows that the bilingual word embeddings have a good performance.A multi-feature fusion method for bilingual news summarization in Chinese and Vietnamese.In order to solve the problem that bilingual texts are difficult to analyze to generate a bilingual summary,a multi-feature fusion method for bilingual news summarization in Chinese and Vietnamese was proposed.Firstly,according to the characteristics of news texts,this method analyzes the co-occurrence degree of news elements and the similarity between sentences based on bilingual dictionaries and bilingual word embeddings.Then,these two features are integrated into an undirected graph and TextRank algorithm is used to sort sentences.And then,resorting the sorted sentences according to the location features of these sentences.Finally,select important sentences and remove redundancy to generate a summary.An experiment was conducted on the Chinese and Vietnamese bilingual news archive.The result shows that the proposed method achieved good results and has effectiveness.Comparative summarization for Chinese and Vietnamese news based on bilingual topic clustering.In order to obtain the differences between the Chinese and Vietnamese bilingual news,namely,generate a comparative summary,this thesis presents a method of comparative summarization for Chinese and Vietnamese news based on bilingual topic clustering.This method describes the differences between Chinese and Vietnamese bilingual news from the topic level.Firstly,using the LDA model to extract topics from bilingual news.Then,the bilingual topics are clustered by using bilingual word embeddings,and they are divided into common topics and unique topics.Finally,using unique topics to extract Chinese and Vietnamese sentences to form a comparative summary.The experimental result shows that the proposed method achieved good results in the task of summarizing the differences between Chinese and Vietnamese bilingual news.A prototype system of comparative summarization for Chinese and Vietnamese bilingual news.A prototype system of comparative summarization for Chinese and Vietnamese bilingual news was developed.The system collects Chinese and Vietnamese bilingual news from the Internet and analyzes the bilingual news on the same event,then generates a generic summary and a comparative summary for the bilingual news and shows users the final results.
Keywords/Search Tags:cross-language analysis, news texts, comparative summarization, Chinese, Vietnamese
PDF Full Text Request
Related items