Font Size: a A A

Research On Hot Topic Detection System In Internet

Posted on:2016-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:L X PanFull Text:PDF
GTID:2348330479453419Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, people increasingly tend to get hot news information from network. As the network data has extensive source of information,dissemination speed, chaotic content and other characteristics, to find topics of interest from the massive network traffic has a certain degree of difficulty, therefore it is necessary to find a method to automatically process the data on the network and discover hot topics.Network hot topic detection using technology such as crawl technology, text processing,topic detection and hot topic recognition, automatically crawl pages from the network and discover hot topics of internet to general public. So it has gradually become a hot research content.Through the study of topic detection algorithm, a double-layer clustering model based on density clustering algorithm and single-pass clustering algorithm is designed, and the model is used for network topic detection. According to the characteristics of the network data is huge, DBSCAN algorithm is used to cluster network data on each climb and form micro topic classes, and then the single-pass algorithm is used to incrementally cluster the micro classes and form the topic categories. Aiming at the speed of the single-pass algorithm which is used in the double clustering model is slow, we improved the algorithm.Because the single-pass algorithm needs to calculate the similarity of each report with all reports in the class, it's less efficient. Combined with the concept of mass center, we use the centroid vector to express the micro topic class and topic class. So when calculating the similarity between micro topic class and topic class, just calculate the similarity of centroid vector of them, reducing the complexity of the calculation. In hot topic recognition section,A heat measurement method of topic is designed. Combining two aspects of media focus and user attention, we quantify the factors that influence the heat of topic, resulting in topic heat formulas. The topic heat formulas is used to get heat value, and then topics is ranked based on the heat value. Based on the research of related technology, a network hot topic detection system in internet is designed and implemented by using crawler technology,topic detection and hot topic recognition technology.By comparing the traditional Single-pass algorithm with the double-layer model whichis used in our system, verified the feasibility of the double-layer model scheme. The system is used to process the network data to get hot topics. Results produced by the system and hot topics provided by typical websites were compared to verify the effectiveness of the system.
Keywords/Search Tags:Web Crawler, Text Clustering, Topic Detection, Hot Topic Recognition
PDF Full Text Request
Related items