Font Size: a A A

Research On Web-partition Technique In Distributed Information Collection System

Posted on:2011-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WeiFull Text:PDF
GTID:2178330338979773Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer networks, search engines, as an Internet-based information retrieval system of application layer of technology, are also developing rapidly。As the geographical distribution of Web and network infrastructure and other aspects of limits, with the Web's rapid expansion, the current search engines because of its cluster scheduling policy suffer coverage and update rate bottlenecks, bring the whole network high load。Distributed search engines under the wide area network, can very well adapt to the needs of Web information management, gain more efficiency while reducing network load than the traditional search engine。In this paper, with the destination to improve the response rate and download rate of the distributed search engine system for distributed information collection system in distribution search engine, we focus on crawler group scheduling problem。We propose a Web partition strategy based on support vector machine, with its better use of crawlers with the relative positions in distributed systems in wide area network to gain high-quality Web partition。Based on the research process, firstly, we proposed a definition of network distance based on distributed information collection system of distributed search engine, and to reduce the network of distributed data acquisition system from the total amount of the research for this article, is the main direction; secondly, we proposed based on support Vector Web partition strategy to achieve this purpose, and compare to some classic Web Partition schemes'performance by experiments。In stage of the Web division to take advantage of improved sensor and network distance prediction network distance prediction, we obtained feature vectors for the support vector machine; with these feature vectors we train the support vector machine。Then we do Web partition using the trained device。Finally, through Experimental results and analysis of the results of Web partition, we can get the conclusion。Aimed at the definition of the network distance in information collection system, improved network distance prediction algorithm and Web partition system, experiments and analysis are all carried out。With experiments and analysis, we can see that the Web partition scheme based on distributed information collection system has relative ideal performance; in that case, we can undertake high-quality Web partition。...
Keywords/Search Tags:distributed search engine, distributed information collection, Web partition, support vector machine, network distance system
PDF Full Text Request
Related items