Font Size: a A A

Research And Application Of Topic-based Automatic Summarization Of Short Text

Posted on:2018-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:X P ChenFull Text:PDF
GTID:2348330512488244Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,network contact are used more and more by people since the rise of social-media such as microblog platforms,web forums and question answering systems.These platforms enable people to communicate with each other conveniently and have yield a huge volume of short texts,which are length-limited,free in expression and structural fuzzy.However,the disadvantages in short texts make people feel disturbed in their reading.It has been an urgent problem to be solved that how to master a large amount of information in limited time and grasp the trend of events quickly.Automatic summarization is an effective way to solve this issue,which is one of the important tasks of text mining.The summaries present main opinions of the texts in a clear and concise way.This paper aims to extract summary sentences from microblogging texts automatically and concerns about the statistical features and the latent topic information in the texts at the same time.The main work of this paper includes the following two aspects:1)The representation model for short texts: In order to overcome the shortcomings of traditional text representation model and meet the needs of topic-based automatic summarization task,we improve the semantic-based representation model for short texts and use Latent Dirichlet Allocation(LDA)word vector as a tool for modeling short texts,and then we use gradient decent algorithm to calculate the weights of word vectors.After that every word vector is multiplied with its weight and the weighted vectors are averaged to arrive at a single text representation.Finally,we get a representation for short texts in the topic space.It proves that our improved representation model has a better topic information expression ability than the semantic-based representation model in the experiments,and it has improved by 2.5%.2)Extraction of automatic summarization: we propose the LDA-Co Rank algorithm based on Co Rank algorithm,which is an automatic summarization algorithm based on graph-ranking.The LDA-Co Rank algorithm is mainly improved by four points:a.Edge redefinition,we concern every text in the topic as vertexes,and adopt the LDA-based representation model for these vertexes.Then the cosine similarity between vertexes will be a judgment which decides to establish edges or not.b.Word weight redefinition: Hybrid TF-IDF method is used to calculate theweight of the key word,and the word-sentence weights are obtained by iteration.c.The redundancy control strategy is adopted in this paper.The Maximum Marginal Relevance algorithm(MMR)is used to control the redundancy of the candidate summary sentences.d.The generation of summaries: Getting the results after optimizing the structure of the candidate summary sentences and rearrange them.Finally,ROUGE evaluation methods and artificial interaction evaluation method are used to evaluate the quality of summary results.Our method got the highest ROUGE score among Text Rank,Co Rank,and LDA-Co Rank,and The F-value on the ROUGE-1 was improved by 5.66% compared with the Co Rank algorithm.And the performance of the system is also close to the upper bound.It proves that LDA-Co Rank method outperforms the other two baseline approaches in the experiments.
Keywords/Search Tags:automatic summarization, short text, topic model, graph rank
PDF Full Text Request
Related items