Font Size: a A A

Research On Chinese Micro-blog Hot Topics Detection

Posted on:2015-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2298330431477086Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet technology, microblogging, a newsocial network platform has rapidly risen, and has become a new way of communicatingfor users. Microblogging as a carrier can help users quickly and easily express their views,exchange of information, interact of emotion and share of resources. The specificcharacteristics of timeliness and randomness of microblogging platform can makemicroblogging information be rapidly spread and proliferated, and generate a stronginfluence in the real world. A lot of valuable information such as political points,unexpected events and so on, is implicated in microblogging texts. Extraction and retrievalhot topics from microblogging texts can help users quickly understand the realtime hotinformation in society. Monitoration network public opinions and search realtimeinformation have important practical significance. However, microblogging texts have thecharacteristic of big data, so that can not be identified and filtered by labor. Therefore, weresearch on automatic detection technique for microblogging text hot topic based onseeking for means to filter related information has become a research hotspot ofinformation retrieval field.The thesis first introduces the background, research status and related technologies ofhot topic detection, analyzes the information features and transmission characteristics ofChinese microblogging. For the problem of information filtering on topic detection, amethod based on user role orientation was put forward. This method calculates userattention by the number of followers and friends, and calculates microblogging influenceby the number of microblogging retransmissions and comments, and then comprehensivelyevaluates user influence by user attention and microblogging influence. The user roleorientation realizes information roughness filtering before hot topic detection. Secondly,preliminarily detects topic in microblogging information adopted the improved Single-Passincremental clustering algorithm. Lastly, assessment and ranking the heat of microbloggingtopic according to influencing factors of topic heat such as the number of retransmissionsand comments, ect. so then finds hot topics within a certain period of time. The paperoptimizes text preprocessing method and text feature selection method which used inChinese microblogging topic detection, and calculates the feature weight adopted the TermFrequency-Inverse Document Frequency (TF-IDF) function combined with semanticsimilarity. Based on the above method, SINA Microblogging corpora as carriers, on whichlaunched some related experiments in this paper. The evaluation norms include recall ratio,miss ratio, fallout ratio and misdetect overhead was formulated by TDT Session are usingas evaluation indexes to analyze and compare the results of experiments. The experimentalresults show that user role orientation method proposed in this paper can effectivelyachieve the division of microblogging user categories, which provides a basis forinformation filtering before topic detection. Using the evaluation methods based on userattention and microblogging influence, the miss ratio and fallout ratio of hot topic detectionrespectively decrease by12.09%and2.37%. The efficiency and accurate rate of topicdetection technology in this paper is better than traditional topic detection technology,which proves the effectiveness of the method proposed in this paper.
Keywords/Search Tags:Chinese microblogging, topic detection, user role, semantic similarity, Single-Pass clustering
PDF Full Text Request
Related items