Font Size: a A A

Research On Related Technologies Of Micro-blog Data Processing

Posted on:2017-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2308330509453179Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Micro-blog as a popular platform for information exchange, has caused extensive concern. In order to increase the readability of the micro-blog data and convenience of user reading micro-blog messages, research on related technologies of micro-blog data processing has become a hot spot. Different from the traditional text data,micro-blog message text has 140 words upper limit and can spreads rapidly through repost. Therefore, in processing micro-blog data, the algorithm should use micro-blog’s unique properties to improve the defect of the short text length of micro-blog messages.Micro-blog messages clustering is a micro-blog data processing method, its purpose is to classify different micro-blog messages theme. Clustering micro-blog messages on the same theme together can facilitate the reader to find micro-blog messages relevant to their interested theme. Micro-blog clustering algorithm already has some research results. For example, the improved Single-pass clustering algorithm based on LDA model for topic document model, adding the idea of topic center and batch processing in the traditional Single-pass clustering algorithm,clusters micro-blog message sets. This paper presents the micro-blog Single-pass clustering algorithm based on repost tree. The main idea is introducing repost relations into the improved Single-pass clustering algorithm as a factor of clustering micro-blog message sets. According to the experimental data, by using the repost relations, the clustering effect can be improved.Micro-blog summarization, whose main purpose is to extract the summary from micro-blog data, can make user easily get the information needed in the massive micro-blog data. Many existing micro-blog summarization methods are from traditional text summarization method. In the text summarization, the comparative summarization as an application of the text summarization is proposed, its purpose is to extract comparative summary from description document such as news and product.On this basis, twitter comparative topic summarization has been proposed as a application of comparative summarization in micro-blog. In this paper, chinese micro-blog comparative topic summarization based on topic set has been proposed.Different from twitter comparative topic summarization, this algorithm generates summaries by comparing the topic sets instead of messages. The experiments resultsshow that the method based on topic sets improves the shortcoming of insufficient information in single message and obtains an increase in the representativeness of comparative topic summaries.
Keywords/Search Tags:Micro-blog, Clustering, LDA, Summarization, Topic Sets
PDF Full Text Request
Related items