Research On Web-partition Technique In Distributed Information Collection System

Posted on:2011-05-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Wei

Full Text:PDF

GTID:2178330338979773

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer networks, search engines, as an Internet-based information retrieval system of application layer of technology, are also developing rapidly。As the geographical distribution of Web and network infrastructure and other aspects of limits, with the Web's rapid expansion, the current search engines because of its cluster scheduling policy suffer coverage and update rate bottlenecks, bring the whole network high load。Distributed search engines under the wide area network, can very well adapt to the needs of Web information management, gain more efficiency while reducing network load than the traditional search engine。In this paper, with the destination to improve the response rate and download rate of the distributed search engine system for distributed information collection system in distribution search engine, we focus on crawler group scheduling problem。We propose a Web partition strategy based on support vector machine, with its better use of crawlers with the relative positions in distributed systems in wide area network to gain high-quality Web partition。Based on the research process, firstly, we proposed a definition of network distance based on distributed information collection system of distributed search engine, and to reduce the network of distributed data acquisition system from the total amount of the research for this article, is the main direction; secondly, we proposed based on support Vector Web partition strategy to achieve this purpose, and compare to some classic Web Partition schemes'performance by experiments。In stage of the Web division to take advantage of improved sensor and network distance prediction network distance prediction, we obtained feature vectors for the support vector machine; with these feature vectors we train the support vector machine。Then we do Web partition using the trained device。Finally, through Experimental results and analysis of the results of Web partition, we can get the conclusion。Aimed at the definition of the network distance in information collection system, improved network distance prediction algorithm and Web partition system, experiments and analysis are all carried out。With experiments and analysis, we can see that the Web partition scheme based on distributed information collection system has relative ideal performance; in that case, we can undertake high-quality Web partition。...

Keywords/Search Tags:

distributed search engine, distributed information collection, Web partition, support vector machine, network distance system

PDF Full Text Request

Related items

1	Distributed SVM Algorithm With K-means
2	Research Of A Distributed Web Crawler Search Engine Based On Web Information Collection
3	Design And Implementation Of Meta-search Engine System Based On Distributed Architecture
4	Distributed Based On The Search Engine Irst Improvements
5	A Study About The Heterogeneities In Meta-engine Systems
6	Research And Implementation Of Distributed Network Search Engine
7	The Design Of Social Network Message Push Based Distributed Search Engine
8	Research On Activity Analysis And Semantic Retrieval In Distributed Intelligent Visual Surveillance System
9	Adaptive Scheduling Using Support Vector Machine on Heterogeneous Distributed Systems
10	Research On Key Technologies Of Distributed Web Crawling