Searching Strategy Research For Intelligent Web Crawler

Posted on:2005-01-04

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhang

Full Text:PDF

GTID:2168360155462527

Subject:Computer application technology

Abstract/Summary:

In recent years, the hotspot in the research of search engine is how to get more and more web pages on the users' interests in the Web resources. In this paper, we carry on the research in the searching strategy of topic web crawler mainly aiming at the problem of increasing the searching efficiency through the improvement on the web crawler's self-adaptability.First of all, we introduce the current achievements of research in web crawler. After the compare of the advantages and disadvantages of some current searching strategies, we conclude that the key problem in increasing the searching efficiency lies on improving on the web crawler's self-adaptability and the veracity in predicting the linkages' importance.To improve on the web crawler's self-adaptability, the algorithm based on combined linkages' reward is proposed, which combines the linkage's immediate reward and the future reward to evaluate linkages' importance. Moreover, we utilize the changes of rewards to speculate about how relevant the candidate page-set is to topics, based on which the crawler can dynamically adjust the relationship between these two rewards resulting in achieving the searching strategy most suitable for the actual searching state. Our experiments show that compared with some traditional algorithms, this algorithm has better performance.To more accurately predict the linkages' value and resolve the problem of topic-drift in traditional PageRank, an improved PageRank algorithm based on topical segments is proposed. This algorithm segments the Web page into blocks and passes the page's PageRank to outlinks in each block in proportion with the block's relativity to the given topic. Moreover, it regards the visited outlink as feedback to modify the block's relevance. The experiment in Web crawler shows that the new algorithm has better performance.Moreover, in this paper a web searching strategy based on inheritance algorithm is proposed, which introduce the inheritance algorithm into the web crawling. It looks the various combination of web information about parent web pages, sibling web pages, the text in linkages and the url tokens as the various gene sequence. Through some genetic operation like cross and mutation, the mode of combination of web information can dynamically change with the actual web resource, resulting in the best searching strategy. Our experiments show that the new...

Keywords/Search Tags:

Web spider, Specific search engine, Searching strategy, Pagerank, Genetic algorithm

Related items

1	Design And Realize Of Spider In Vertical Search Engine
2	Design And Implementation Of A Spider For Topic-Specific Search Engine
3	Based On Web Spider Search Strategy To Consolidate Learning
4	The Theme Of The Search Engine Web Spider Search Strategy Study
5	The Strategy Of Topic-specific Web Crawler Based On Semantics Similarity
6	Research And Realization On Correlation Techniques Of Topic Search-specific Engine
7	Research And Realization On Correlation Techniques Of Topic Search-Specific Engine
8	Professional Search Engine Distributed Robot Design
9	Network-based Professional Search Engine Spiders Search Strategy
10	Research And Design On The Search Engine Based On The Enhanced Similarity Pagerank Algorithm