Font Size: a A A

Event-Oriented Automatic Summarization Of Social Media Text

Posted on:2018-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:C Y GuanFull Text:PDF
GTID:2348330512983567Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Automatic text summarization technology is an important branch of natural language processing,which is initially widely used in long text summarization tasks,such as scientific papers,news and so on.In recent years,social media message with short text,like Sina Weibo and Twitter,are gaining wide popularity.Its convenient and way of use and the massive information resources make people begin to real-time access different kinds of information and resources via the platform,especially the real and hot social event information.However,social media text is short in length and fragmented in content,and is accompanied with huge redundancy,which brings people great difficulties to identify and understand the right information.Thus,automatic summarization task regarding social media text as the research target is established.Most of the existing automatic summarization methods are based on the combination of key sentences extracted from the original text to form a summary.However,because of neglecting the analysis and understanding of the text structure as well as text features,the summary generated is poor in readability and has redundancy problems.With the continuous improvement of the deep learning technology,its application in automatic summarization filed achieved good performance.Meanwhile,it is a good way to make up for the shortage exists in extractive summarization methods.However,the current research target mainly involves sentence and paragraph,and it is lack of application for the practical tasks.In this paper,focusing on the real social events that have raised extensive discussions,we apply the automatic summarization technology to generate a summary which can summarize the event comprehensively and provide it to users,so as to save their time and energy to obtain the event information.By combining the advantages of extract and abstract summarization methods,this paper proposes a strategy dividing the event summarization task into two steps.First,through the use of clustering technology combined with Canopy and K-means and time stamp technology we can divide the initial event texts into a number of sub-clusters,aiming at identifying the key aspects of the event or its development process.At the same time,inspired by the generation process of artificial abstract,an improved Encoder-Decoder framework based on attention model named MEOD is proposed as the summary generation model used in this paper.Sub-topic texts generated by the first step are used as Encoder-Decoder model input to generate sub-summary.The combination of all sub-summaries forms the final event summary.Through the automatic evaluation and manual evaluation of the experimental results,it shows that our method is superior to the contrast method,which proves the effectiveness of our summarization method.In this paper,the social features and the timestamp information used in sub-topic recognition are quite effective in improving the accuracy and completeness of sub-topic partitioning.In the meanwhile,the summarization model based on Encoder-Decoder framework proposed in this paper is significantly improved in summary quality,especially in terms of readability.In addition,the idea of combining extract and abstract summarization methods proposed in this paper provides a new thinking direction for the study of multi-document event summarization with short text.
Keywords/Search Tags:automatic summarization, Encoder-Decoder, clustering algorithm, social media, events
PDF Full Text Request
Related items