Based On Web Spider Search Strategy To Consolidate Learning

Posted on:2004-05-17

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Li

Full Text:PDF

GTID:2208360122467139

Subject:Computer application technology

Abstract/Summary:

The traditional Web search engines are severely challenged by the quantity of exponentially increasing information resources on the Web due to the extensive application of the Internet technologies. This may account for the spring-up of the Topic-Specific Search Engines. Research on finding efficient searching strategies plays an essential role in the application and the progress of the Topic-Specific Search Engines. In the perspective of the application of the reinforcement learning on Web spiders, this paper mainly focuses on the work of finding efficient searching strategies.The concept of reinforcement learning and the recent researches on searching strategies of Web spiders are first introduced. Based on the analyses of the characteristic of each spider and its advantage and disadvantage, the paper sums up the three keys to improving the efficiency of Web spiders, namely improving the accuracy of evaluation of the value of hyperlinks, reducing the time consumption and complex of space and embedding adaptivity to the spiders.With the view of reducing the time consumption and complex of space, the paper aims at improving the studying efficiency of the Web spider based on reinforcement learning. A new reinforcement learning algorithm based on hidden biasing information learning is proposed. The main idea of this algorithm is to take advantage of some useful state features to explore more effectively. As a result, the efficiency of study is improved. We validate our new algorithm by experiment on the Box Pushing Task. The results show that the new algorithm has better performance. The application of the new algorithm on Web spider is discussed and a new spider learning algorithm based on hidden biasing information learning is proposed. We validate our new algorithm by experiment on Web site learning. The results show that the new algorithm has better performance.To improve the accuracy of evaluation of the value of hyperlinks, the paper combines two evaluation methods of the value of the hyperlinks , namely the method based on the evaluation of immediate reward value and the method based on the evaluation of future reward value. Based on this, a novel Web spider model is proposed. A new function named value belief function is contributed to solve theproblem of belief assignment of the immediate reward value and the future reward value. Then a heuristic searching algorithm based on degrading the belief of future reward value is proposed. The main idea of the algorithm is to enhance exploration by assigning great belief value of future reward at the early stage of searching process and gradually to attach great importance on exploitation by degrading the belief value of future reward at the later stage of searching process. The results of searching experiment on real Web site show that the new algorithm has better performance than the traditional algorithms.This paper also discusses the issue of exploitation versus exploration in searching strategies. To avoid being trapped in the local optimal areas in the early stage of searching process, a new searching algorithm based on the idea of simulated annealing is proposed. The main idea of the algorithm is to enhance exploration by selecting some links with relatively low value to follow in the early stage of searching process and gradually focuses on exploitation by selecting the links with relatively high value to follow in the later stage. The results of searching experiment on real Web site show that the performance of the new algorithm is not only better than that of the traditional algorithms, but also better than that of the heuristic searching algorithm based on degrading the belief of future reward value on the whole.Finally, a prototype of Web spider on computer papers is designed based on the combination of algorithms above.

Keywords/Search Tags:

Web spider, topic-specific search engine, reinforcement learning, simulated annealing

Related items

1	Research And Realization On Correlation Techniques Of Topic Search-specific Engine
2	Research And Realization On Correlation Techniques Of Topic Search-Specific Engine
3	Design And Implementation Of A Spider For Topic-Specific Search Engine
4	The Research And Implementation On The Spider Of The Vertical Search Engines Based On The Reinforcement Learning
5	The Strategy Of Topic-specific Web Crawler Based On Semantics Similarity
6	Research And Achievement Of The Search Strategic For The Topic Search Engine Spider
7	Research And Realization Of Search Enginee On Topic-Specific Based On Ontology
8	The Theme Of The Search Engine Web Spider Search Strategy Study
9	The Crawler Research Of Agent-based Topic-Specific Search Engine
10	Research On Auto-Classification Topic-Specific Search Engine