Font Size: a A A

Microblogging Automatic Summarization Research

Posted on:2013-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:W MengFull Text:PDF
GTID:2218330374465415Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the new generation of today, the MicroBlog open up the platform of mobile communication, more and more resources are uploaded to the network for users to communicate. The sharp increase of the amount of information speeds up the communication between users,. MicroBlog not only works as bridge between mobile communication network and the Internet, but also facilitates the transfer of messages and communication between users, and in this way had made a great contribution to social progress. MicroBlog data is huge, it's not easy for people to find the information they need. In addition, there are so many comments when people show their MicroBlog and it's just difficult for them to read, which brings new challenges to give automatic summarization technology in the field of MicroBlog.In this article the author studied the real data and automatic summarization of Sina MicroBlog, considering eight categories of characteristic themes when calculate sentence weight, discussed the theme in depth by the method of k-means clustering and FarthestFirst clustering method, designed and achieved a prototype system of automatic summarization BMS (the Based MicroBlog Summarization) for the form of MicroBlog. The system is divided into six modules,which are document pre-processing, noise filtering, sub-topic segmentation, feature selection, sentence extraction and summary sentences reordering.The main works of the test:(1) Contrast the K-means clustering method and FarthestFirst, clustering methods, divide the subtopics by the method of K-means algorithm clustering.(2) For the noise processing, the article proposed the removal method of the small probability events, we do statistics about the number of words of comments on the microblog, comparing the number of words of comments and the relevance to the theme, we found that when Comments words less than5words, content correlation is almost0. So we deal with noises by the method of filtering out the comments, whose words less than5and combining with context-sensitive Reply filtering algorithms (3) When calculate the sentence weighting, we use the traditional method, considered of the characteristics of MicroBlog and calculated the right value of the sentence. In feature selection process the sub-theme, title, concern, sharing the characteristics of the number and the number of reviews, fans, reviews, sentence length, position, labels, etc. had been taken into account to carry the weight calculation. In a related experiment, the proposed features can effectively improve the effect of the summary method.(4) In the system evaluation section, the comparison of experimental data shows that the system of this article has progress in the recall rate, accuracy and F-measure values comparing with other systems, the resulting abstracts generated has higher quality.Finally, in the above work, object-oriented MicroBlog automatic abstract quality is improved by that eight feature selection.
Keywords/Search Tags:Automatic summarization, Microblog, K-means, Feature selection, sentence weight calculation
PDF Full Text Request
Related items