Font Size: a A A

Research On The Automatic Summarization Of Chinese News For Micro-blog Application

Posted on:2018-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:M S LiFull Text:PDF
GTID:2348330515970735Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the big data era coming followed by Web 2.0 era,Weibo and other social media are paid more and more concern by the society and the researchers.The related tasks and evaluations orienting Weibo are also increasing year by year.Text automatic summarization is mainly to extract the key information of the text automatically by computer,to generate short essays that reflect the center of the text,so that can compress the text content expediently,speed up the exchange of information,enhance the information retrieval efficiency.This paper proposes a Weibo-Oriented Chinese news automatic summarization study.Through the text automatic summarization technology,the Weibo users can reduce the news concerned rapidly and then relay.Thus it is with a high practical value for reducing the time of Weibo users manually editing the news word and relaying.On the basis of deep study on the means of Chinese text automatic summarization existing in the relevant research,this paper proposes a Weibo-Oriented Chinese News Summarization algorithm based on Multi-feature and Ranking SVM sorting model.The specific research work is as follows:(1)multi-level feature extraction.Dig out statistical features like sentence frequency,sentence position,sentence length,similarity between sentence and title,descriptive words and the news text semantic feature of topic sentence.Comprehensively analyze the extraction method of the different features,and actively explore the representation model based on the news text to use the text features better.(2)text sentences ranking.This paper first preprocesses the sentences in the training data of news text and makes them into a form that can be used,what is more,converts it into a dat file as input to obtain the Ranking SVM training model.Then the sentences in the test data are sorted by the Ranking SVM model obtained from the training data.Finally the sentences are sorted from high to low.In the later period we deal with the sentences using the process of redundant and smoothness,reference digestion and other corresponding readability principles processing,ultimately getting the relatively smooth and ideal collection of summary sentences with low redundancy.Finally,this paper applies the feature extraction method and sentence ranking algorithm above to the Weibo-Oriented Chinese news automatic summarization evaluation data set 2015 in natural language processing and Chinese computing(NLP & CC),the ROUGE-1 value of the experimental results is more than 50%,having feasibility.
Keywords/Search Tags:Semantic feature, Statistical features, Ranking SVM, Chinese News automatic summarization, Polish process
PDF Full Text Request
Related items