Font Size: a A A

Based On Web Spider Search Strategy To Consolidate Learning

Posted on:2004-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:2208360122467139Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The traditional Web search engines are severely challenged by the quantity of exponentially increasing information resources on the Web due to the extensive application of the Internet technologies. This may account for the spring-up of the Topic-Specific Search Engines. Research on finding efficient searching strategies plays an essential role in the application and the progress of the Topic-Specific Search Engines. In the perspective of the application of the reinforcement learning on Web spiders, this paper mainly focuses on the work of finding efficient searching strategies.The concept of reinforcement learning and the recent researches on searching strategies of Web spiders are first introduced. Based on the analyses of the characteristic of each spider and its advantage and disadvantage, the paper sums up the three keys to improving the efficiency of Web spiders, namely improving the accuracy of evaluation of the value of hyperlinks, reducing the time consumption and complex of space and embedding adaptivity to the spiders.With the view of reducing the time consumption and complex of space, the paper aims at improving the studying efficiency of the Web spider based on reinforcement learning. A new reinforcement learning algorithm based on hidden biasing information learning is proposed. The main idea of this algorithm is to take advantage of some useful state features to explore more effectively. As a result, the efficiency of study is improved. We validate our new algorithm by experiment on the Box Pushing Task. The results show that the new algorithm has better performance. The application of the new algorithm on Web spider is discussed and a new spider learning algorithm based on hidden biasing information learning is proposed. We validate our new algorithm by experiment on Web site learning. The results show that the new algorithm has better performance.To improve the accuracy of evaluation of the value of hyperlinks, the paper combines two evaluation methods of the value of the hyperlinks , namely the method based on the evaluation of immediate reward value and the method based on the evaluation of future reward value. Based on this, a novel Web spider model is proposed. A new function named value belief function is contributed to solve theproblem of belief assignment of the immediate reward value and the future reward value. Then a heuristic searching algorithm based on degrading the belief of future reward value is proposed. The main idea of the algorithm is to enhance exploration by assigning great belief value of future reward at the early stage of searching process and gradually to attach great importance on exploitation by degrading the belief value of future reward at the later stage of searching process. The results of searching experiment on real Web site show that the new algorithm has better performance than the traditional algorithms.This paper also discusses the issue of exploitation versus exploration in searching strategies. To avoid being trapped in the local optimal areas in the early stage of searching process, a new searching algorithm based on the idea of simulated annealing is proposed. The main idea of the algorithm is to enhance exploration by selecting some links with relatively low value to follow in the early stage of searching process and gradually focuses on exploitation by selecting the links with relatively high value to follow in the later stage. The results of searching experiment on real Web site show that the performance of the new algorithm is not only better than that of the traditional algorithms, but also better than that of the heuristic searching algorithm based on degrading the belief of future reward value on the whole.Finally, a prototype of Web spider on computer papers is designed based on the combination of algorithms above.
Keywords/Search Tags:Web spider, topic-specific search engine, reinforcement learning, simulated annealing
PDF Full Text Request
Related items