The Implement&Study On Web-Based Topic Search

Posted on:2015-12-05

Degree:Master

Type:Thesis

Country:China

Candidate:X Yan

Full Text:PDF

GTID:2298330467463943

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the Internet age, the Internet is developing very quickly, the network information resources has become increasingly diverseã€‚Currently, the users obtain the required network information primarily through search engines. But due to the explosive growth of the WEB page information and real-time update feature, how to get the specific topic information in the massive network information, has become the focus of our study subjects.On the basis of the design and implement of the NEEP information collection system, the main research content of this paper is Web-based Topic Search oriented the NEEP information collection, the key technology of topic search is emphatically researched, mainly introduces the topic web crawler search algorithm, topic correlation judgment, Web document classification algorithms in the design and implement of this system. The main work and innovations can be described as follows:1. The design and implement of Search algorithm of topic crawler: In the design of topic crawler oriented the NEEP information collection system, through genetic algorithm to choose optimal from global to control the crawling direction of topic crawler, avoid falling into local optimum. At the same time, the non-greedy strategy collect URL linked web pages, the combination of the two, the non-greedy search genetic algorithm has been put forward, so as to ensure the accuracy of topic crawler crawling direction, and can ensure the relevance of collection pages.2. The design and implement of topic correlation judgment:through the establishment of vector space model to judge the topic correlation of the web content, the topic crawler computes the topic correlation of the hyperlinks in their Web page, anchor texts, URL3.The design and implement of Web document classification algorithms:through the improved K-means algorithm to cluster entire data sets of initial data, to compute the similarity between the missing data sets and the cluster, add the records to the corresponding cluster. And then to classify the data by the Naive Bayes algorith. Through the experiment test, the performance of Naive Bayes classification model based on improved K-means algorith has been improved obviously.All the above research, the paper elaborates the design and implementation the topic web crawler process in the NEEP information collection system. Through the performance test of topic crawler, the effectiveness of the system design search algorithm of topic crawler has been verified.

Keywords/Search Tags:

topic web crawler, search algorithm, topic searchtopic, correlation

PDF Full Text Request

Related items

1	Research On The Topic Crawler Algorithm Based On Vector Space Model
2	Research On The Key Technology Of Focused Crawler
3	Optimization And Implement Of The Topic Web Crawler Correlation Algorithms
4	The Design Of Specific Topic Web Crawler And Its Transmission Group
5	Research And Implementation Of Topic Web Crawler Oriented To Web Mining
6	Research And Design Of Topic Crawler Through Tunnels Algorithm
7	The Design And Implementation Of Topic Web Crawler About Mining Equipment
8	Research On Topic Search And Its Key Algorithm
9	The Research Of Topic Crawler Search Strategy Based On Genetic Algorithm
10	Research On Techniques Of Domain-Specific Topic Searching