Font Size: a A A

Improvement And Design Of Automatic Text Abstraction Algorithm For We-media Texts

Posted on:2019-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:W B SuFull Text:PDF
GTID:2428330578972773Subject:Information Science
Abstract/Summary:PDF Full Text Request
Content entrepreneurship competition is fierce,since the media platform is raging.The audience range of WeChat,micro-blog,headline and Bai Jia Hao has been far more than that of traditional media.The we-media,which exists in traffic,brings convenience to people as well as massive text information,from the media text,advertising,electricity providers,services and other factors also give users access to information has increased the difficulty.How to help users quickly select and discriminate from the media text,high efficiency and high quality to obtain their own information and grasp the trend of we-media under the same topic is an urgent problem to be solved.Text summarization is a real response to the content of a text,which is concise and complete.In the context of we-media text,it pays attention to the textual statistical features,focusing on the underlying theme of the text,which can help users choose and identify their own media content,and automatically generate a brief summary of the text,which greatly improves the user's reading efficiency.The main research work in this paper is as follows:(1)Using the theme crawler technology to focus on the we-media text under the user's search theme,build the we-media text corpus,and collect the text in the way of topic block,which can greatly improve the text coverage and page utilization.(2)In order to meet the demand of the theme and solve the problem of low computational efficiency of we-media text similarity,thispaper proposes a text similarity calculation method based on LDA.This method uses LDA to mine the potential topic layer of text,expresses word,word,sentence and document as the direction of related topics,and constructs theme space,and the cosine similarity of the theme vector is the text similarity.Experimental results show that this method is better than LD,tf-idf and PLSA method in we-media text similarity accuracy,and can reduce computational complexity,improve computational efficiency,and eliminate the influence of unlogon words,so as to avoid the use of external dictionaries.(3)Based on the image sorting idea,the LDA-WSCoRank+automatic abstract algorithm is proposed for the we-media text corpus set under the same theme.This method makes the following improvements to the CoRank algorithm:the words and phrases are scored together,so as to avoid the omission of the weight of the special words when the sentence is sorted by sentence;refactoring between sentences,using the topic vector cosine similarity said edge weights,and according to the experience of setting threshold edge relations governing the number of edges between nodes to reduce computational complexity;redundant control,through the MMR algorithm,to get through the redundancy of the diagram,to increase the digest coverage;output optimization,according to the original word order,restructuring the sentences,keep the abstract consistency and readability.Finally,the improved algorithm was verified by WeChat text as the test corpus.By comparing TeamBest,TextRank and CoRank algorithm,LDA-WSCoRa-nk+scored higher on ROUGE-1,ROUGE-2 and ROUGESU,indicating that the algorithm was excellent in the integrity and consistency of the.(4)On the basis of this research work,this paper designed and implemented the automatic media text automatic abstract system,and gave the detailed design and implementation process.The user experiment proved that the system can greatly improve the efficiency and accuracy of users' choice and selection of articles,and improve the user's reading revenue.
Keywords/Search Tags:automatic summarization, topic model, we-media text, Similarity calculation, word sentence coordination rank
PDF Full Text Request
Related items