Font Size: a A A

The Implement&Study On Web-Based Topic Search

Posted on:2015-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:X YanFull Text:PDF
GTID:2298330467463943Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the advent of the Internet age, the Internet is developing very quickly, the network information resources has become increasingly diverse。Currently, the users obtain the required network information primarily through search engines. But due to the explosive growth of the WEB page information and real-time update feature, how to get the specific topic information in the massive network information, has become the focus of our study subjects.On the basis of the design and implement of the NEEP information collection system, the main research content of this paper is Web-based Topic Search oriented the NEEP information collection, the key technology of topic search is emphatically researched, mainly introduces the topic web crawler search algorithm, topic correlation judgment, Web document classification algorithms in the design and implement of this system. The main work and innovations can be described as follows:1. The design and implement of Search algorithm of topic crawler: In the design of topic crawler oriented the NEEP information collection system, through genetic algorithm to choose optimal from global to control the crawling direction of topic crawler, avoid falling into local optimum. At the same time, the non-greedy strategy collect URL linked web pages, the combination of the two, the non-greedy search genetic algorithm has been put forward, so as to ensure the accuracy of topic crawler crawling direction, and can ensure the relevance of collection pages.2. The design and implement of topic correlation judgment:through the establishment of vector space model to judge the topic correlation of the web content, the topic crawler computes the topic correlation of the hyperlinks in their Web page, anchor texts, URL3.The design and implement of Web document classification algorithms:through the improved K-means algorithm to cluster entire data sets of initial data, to compute the similarity between the missing data sets and the cluster, add the records to the corresponding cluster. And then to classify the data by the Naive Bayes algorith. Through the experiment test, the performance of Naive Bayes classification model based on improved K-means algorith has been improved obviously.All the above research, the paper elaborates the design and implementation the topic web crawler process in the NEEP information collection system. Through the performance test of topic crawler, the effectiveness of the system design search algorithm of topic crawler has been verified.
Keywords/Search Tags:topic web crawler, search algorithm, topic searchtopic, correlation
PDF Full Text Request
Related items