Font Size: a A A

Research And Implementation Of Event Summarization For Microblog

Posted on:2017-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ZhangFull Text:PDF
GTID:2308330485985326Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularity of microblogging services, microblog data generated by users grow rapidly as every moment goes. Although the search engine is capable to find out a way to return relevant microblog messages from tremendous data based on the event keywords that user input, users are still very easy to get themselves drown in the sea of microblog messages returned by search engine due to microblog messages’characters of massiveness and fragmentation. Hence it has become an urgent problem to solve that how users are assisted to grasp key points of the events. This thesis, in the light of information overload, comes up with an event summarization proposal under microblog circumstances to generate event timeline summary, combining microblogs’nature of short-text and socialization. Themed at key technologies in microblog summarization, the main contents of this thesis are as follows:Firstly, a sub-event detection algorithm based on keywords co-occurrence graph is proposed. Taking into account the co-occurrence of keywords in document messages, the algorithm build a graph. Then, a community detection algorithm is used on this graph to find sub-events, at the same time keywords in community as features of sub-event are extracted. Afterward, messages of sub-events can be clustered according to these keyword features. Experimental results show that the proposed algorithm can find sub-events effectively, and moreover can provide high-quality input for event summarization.Secondly, a microblog event summarization algorithm is proposed. The algorithm finds important timestamps by burst estimate firstly, then detects sub-events from messages at those important timestamps. After that, it sorts sub-events by importance and then selects massages that are at high ranks to do summarization. In this process, taking advantage of microblog messages’socialization feature, two models are proposed:one is to score the importance of each sub-event, and the other is to score each message according to both social concern and sub-event generalization. Experimental results show that the proposed algorithm achieves good result.Thirdly, an event summarization system is implemented with the proposed algorithm employed. Timeline summarization output by this system is comprehensively informative and meaningful to users.
Keywords/Search Tags:event summarization, sub-event detection, document summarization, community detection, natural language process
PDF Full Text Request
Related items