Font Size: a A A

Research On News Summarization Based On Multi-granularity Latent Semantic Model And Submodular Maximization

Posted on:2018-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:B Y LiuFull Text:PDF
GTID:2428330515989688Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and the coming of the Web2.0 era,social network has become an integral part of people's lives,changing the way people acquire and disseminate information.The traditional news media reports tend to be lag and long,in the fast-paced nowadays,it has gradually declined.Social networks,which is a simple and real-time portable information media,has gradually replaced the traditional media to become a news release and communication way after something important happened,Weibo is one of the best.However,due to the length limit of Weibo blogs,traditional news reports can not be released directly on Weibo,the manual generation of is time-consuming and its objectivity can not be guaranteed.How to get a high quality news summary which can be used as Weibo in a short time is an urgent problem.In this context,this thesis launched the research work of Chinese News automatic summarization to Weibo.The main work of this thesis includes the following points:(1)This thesis proposed a news abstraction model based on multi granularity latent semantic model and submodular maximization,which used multi granularity latent semantic model to obtain latent semantic vector for texts,then designed evaluation functions and used submodular maximization to quickly calculate the approximate optimal solution,generating automatic text summarization.(2)In the text representation,this thesis used Orthogonal Matrix Factorization model to get the text latent semantic vector,modeling missing words and make the projection direction approximately orthogonal to reduce redundancy.It solved the sparse news information problems and the poor performance of classical latent semantic model in calculation of sentence similarity problem.In order to introduce the event information and the relationship between words,this thesis constructed a dictionary of dependencies.The final model is a multi granularity latent semantic model which combined the word granularity and the granularity of dependencies.(3)According to the features of news writing,we designed objective function that includes several submodular functions and a non-submodular function that evaluates sentence dissimilarities to assess relevance and diversity of the abstract.With the aid of the submodular function and considering the length limit of the Weibo blogs,a greedy algorithm is designed to obtain the approximate optimal solution.(4)In the experiment,this thesis designed a set of experiments to compare the features,text representation,baseline,multi granularity model and so on.The results showed that in the word granularity model,ROUGE-1 score has reached 0.519,under the conditions of multi granularity model,the experimental results further increased to 0.531,more than other baseline models;proving the effectiveness of the proposed algorithm.
Keywords/Search Tags:submodularity, orthogonal matrix factorization, news summarization, dependency, Weibo-Oriented
PDF Full Text Request
Related items