Font Size: a A A

Research On Burst Topic Detection Based On KL Distance

Posted on:2016-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:J X WeiFull Text:PDF
GTID:2208330470480935Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, network penetration is increasing, making the network beyond the mass media to become channel platform which informed of the masses, to express their own views on major burst news events, hot news events and important new ideas, attitudes. Therefore, how effectively and timely access to effective news and information, analyze news and information, discover the burst events and hot events become the field of information retrieval important research priorities and focus.Topic detection is an important key technology to solve this kind of problem, the technology is mainly timely news from the network data stream found the topic, and the topic has been found in real-time follow-up, which detailed records of the entire topic for people to provide a detailed understanding of the relevant topics of great convenience, but also for the relevant government officials and timely understanding of emergencies, the development of hot events, master of events and developments in a timely manner to control and guide development trends and other related topics. It provides a great convenience.Firstly, this paper studies the key technologies of the topic detection. Then, using the keyword-based methods to detect micro-Bo burst topic. Finally, double filtering based on KL divergence is used to realize the topic link detection.1) Study on micro-Blog burst word extraction and the method of micro-Blog burst topic detection. Put forward a method of micro-Blog burst topic detection based on burst word.More and more people comment on an event by Micro-blog. Therefore, unexpected research topic on the Micro-blog has important significance and value. According micro-Blog short text, data volume, words are not standardized, when the burst topic appeared often accompanied by a characteristic large of burst words appearing, this paper propose micro-Blog burst topic detection based on burst word.This paper fully take into account the characteristics and unexpected topic Micro-blog keywords in text data, it proposes a preliminary screening based on word frequency words and get word of burst candidate methods. Then, combined with published user Micro-Blog’s influence force and the words in the word frequency inverse document frequency get the words in the weights. Then, this paper extracts the higher weighting word as the burst word of topics. Finally, through the improvement Singles-Pass clustering algorithm to cluster burst words in order to complete the micro-Blog burst topic detection.2) Research topic similarity calculation method and dynamic threshold method, proposed double filtration topic link detection based on KL DistanceIn link topic detection, threshold settings and false reports related problems existed, in order to better solve the problems in these two areas, it proposed topics link detection method based on KL distance and named entities.First of all, first threshold value is calculated by KL distance. Considering the news has time characteristics, dynamic threshold method is proposed. Candidate news reports are obtained by comparing the similarity value and the dynamic threshold value. Then, thinking to the named entities has an important role in distinguishing between similar topics, therefore, to extract named entities, by comparing the same named entities and the number of similar named entities, completion of the final topic detection. Experiments show that the dynamic threshold method is efficient and greatly improve the detection effect.
Keywords/Search Tags:Burst topic detection, topic link tracking, burst word, KL divergence
PDF Full Text Request
Related items