Font Size: a A A

The Improvement Of The Performance And Application Of Focused Crawlers Based On Hidden Markov Model

Posted on:2012-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z H QiFull Text:PDF
GTID:2178330335964334Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the high speed development of the internet, general purpose web crawler become increasingly unable to extract the information of the web page effectively while their crawling in this vast network. Topical crawlers are increasingly seen as a way to remove the scalability limitations of universal search engines. The context available to such crawlers can guide the navigation of links with the goal of efficiently locating highly relevant target pages. Through analysis and comparison of several reptiles harvest rate, use this as a performance index to evaluate the focused crawler algorithm and analysis have been developed with the deviation between the optimal performances can get it.In recent years, the Hidden Markov Model has increasingly wide range of applications, and there are pioneer who took the model to guide the crawling process of the theme crawler. Predecessors have used hidden Markov model on theme reptiles, and achieved some results. Practice proved that it is feasible to apply hidden Markov model to the field of theme information collection. On this basis, the paper has a detailed analysis on the existing Hidden Markov Model of theme crawler, and compares with some of the popular theme reptiles on performance, which found that the theme crawler of hidden Markov model has many shortcomings. Therefore, the paper proposes several improved methods to improve the HMM performance on reptile theme. Here, the performance mainly refers to the increased harvest rates. Practice has proved that the rate of reptiles harvest has been greatly improved.This paper focuses on the improvements of theme HMM reptiles, and uses theory and practice proving that the theme crawler of hidden Markov not only has important theoretical value, but also broad prospects.
Keywords/Search Tags:Focused Crawler, Learning Crawler, Hidden Markov Model, World Wide Web
PDF Full Text Request
Related items