Font Size: a A A

Research And Design On Hot Topic Detection And Tracking System In Internet

Posted on:2014-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:C L GuoFull Text:PDF
GTID:2268330401966209Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, the Internet has become animportant platform for governments, businesses and Internet users access to information.Because of the characteristic of the Internet information is scattered, relevantinformation will often be distributed throughout the network, with only a human isdifficult to find hot issues and tracking related information in time. Therefore, the hottopic detection and tracking is conducive to the relevant part of the governmentintelligence to master, efficient and scientific decision-making, and to promote socialprogress and stability, while ordinary users to keep abreast of the topic of interest to thedevelopments. Topic detection and tracking system is designed in order to this goal, itaims to find hot spots from large amounts of data over the Internet and people careabout problems of subsequent reports.This thesis focuses on the hot topic of research and analysis and tracking, to improveon existing technology in this thesis. The content of this study are:1. Proposed an improved Single-Pass clustering algorithm of online identificationmethod. This thesis studies the topic detection process, and a few key issues. Textsimilarity calculation and clustering strategy to make improvements. Single-Passalgorithm efficiency and the use of inverted index and BM25algorithm greatly improvethe operating speed of the algorithm in the case of loss of a small amount of accuracy.2. Proposed a topic heat assessed. This method assumes that the text in the topic hasthe same heat, then reduce by Newton’s law of cooling, combined with the text appearsin the topic time and quantity of information to evaluate the topic heat to sort the topicseasy for people to read.3. The Adaptive KNN topic for an improved tracking algorithm. The x2method canextract tracking topic, and then use the adaptive KNN method of tracking topics, andexperiments show that the algorithm can achieve good results.4. On the basis of related technologies, this thesis design the building frame of topicdetection system, and the concrete implementation of the crawler system, introduces the function of each module and through the actual operation, proved that the scheme hashigh feasibility.Finally, this thesis has realized the online topic detection system, and testing andanalyzing the feasibility and effectiveness of the algorithm.
Keywords/Search Tags:Text similarity, Topic detection, Feature selection, Topic tracking
PDF Full Text Request
Related items