Font Size: a A A

Research On Chinese Microblog Based Automatic Summarization

Posted on:2017-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:F X LiFull Text:PDF
GTID:2348330485471360Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and computer information technology, social networking service platforms also grow rapidly and gradually penetrated into each user groups. It improve people's community rate greatly. Today, microblog has become an important channel for the majority of users and access to information. Because of microblog can access to information quickly and easily. The microblog platform is gathering a large number of user groups and text information resources. The rapid growth of information, not only greatly promoted communication, but also for human civilization and economic development. However, with the microblog platform has become an important source of access to information, speeds of information production and dissemination are far beyond the user's process ability. The question is how to quickly and accurately find the event that users are interested in from a large number of tweets and continue to follow-up reports to understand the event. If users want to know the development of the whole incident, we need to spend a lot of time filtering out irrelevant information which includes a lot of information about the same meaning, which greatly reduces the efficiency that the user obtains valid information.Therefore, how to effectively acquire the content of the same topic from a number of microblog texts and generate automatic summarization technology is essential.This paper mainly explores and researches methods and related knowledge systems and theory of automatic summarization, propose two methods of Chinese microblog automatic summarization and conduct the evaluation and comparison. Main research work of this paper includes the following sections:First, crawl text content of Sina Weibo and classified that into data sets. Use API of Sina Weibo platform interface to obtain text data, tag the data after denoised to get a classification of data sets. We finally choose 5625 microblog data, and divide that into 3612 training data and 1013 test data.Second, achieve VSM based microblog automatic summarization method and LDA based microblog automatic summarization method. On the basis of detailed study on VSM based model and LDA based topic model, construct Chinese microblog automatic summarization method based on these two methods, and evaluate and compare the two methods.Finally, put forward the methods that based on VSM and LDA topic model on microblog automatic summarization. By analyzing results of the two methods of VSM and LDA, propose a combination method of the Chinese microblog automatic summarization. When generating microblog summarization, mainly taking into account the importance of the topic, coverage of the sentence contained in the keywords, word frequency, the length of the sentence, comments and reports and other features to measure the weight of the sentence. And by calculating the cosine of space of the angle to measure the similarity between sentences to compress the statements, complete Chinese microblog summarization. Finally through the experiment to get microblog related theme extraction and results-based VSM generated microblog summarization and based LDA results generated were compared and analyzed.Experimental results show that the VSM and LDA topic model combination method can more accurately extract microblog text summarization content compared with VSM based microblog summarization method and LDA Generating based method, enable users to search for real-time messages.
Keywords/Search Tags:Chinese microblog, LDA theme model, automatic summarization, Vector space model
PDF Full Text Request
Related items