Font Size: a A A

Improvement And Realization Of Vertical Search Arithmetics In Web Spider

Posted on:2009-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhangFull Text:PDF
GTID:2178360272979660Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Web spider is important in vertical search engine and its search algorithm is the core technology of vertical search engine. It is the hot problem to study web spider in recent years that search algorithms based on which strategy can just enhance search recall ratio. Vertical search algorithms are improved and the improved algorithm is realized in this paper.In this paper, general search algorithms and several vertical search strategies is studied, the advantages and disadvantages of heuristic search algorithms at present are analysed, vertical search algorithms are improved, the web spider based on the improved algorithm is designed and realized, and a vertical searching test to industry websites of mobile phones is done. There are three aspects to improve vertical search algorithms as follows. Firstly, a new method of estimating a link value is proposed after having analysed large numbers of source files of web pages, and a experiential formula is given by this paper. Secondly, it can acquire more excellent sequence of links selected and enhance search recall ratio to combine estimation of the threshold withε—greedy policy. Thirdly, a url can be mapped into two numbers by MD5 algoritm, it makes comparative times to judge whether two arbitrary urls are same decend to one or two. The web spider based on the improved algorithm is designed and realized in this paper. The design of web spider includes use case design and class design. The realization of web spider consists of the initialization of the procedure, crawling pages and the end of the procedure. A test is done for industry web sites of mobile phones. It proves that the new method of estimating a link value can enhance the correctness of forecasting the link value, does that it can enhance search recall ratio to combine estimation of the threshold withε—greedy policy, and does that it can increase searching efficiency to map a url into two numbers by MD5 algorithm.
Keywords/Search Tags:web spider, link value, ε—greed policy, estimating thresold, MD5 algorithm
PDF Full Text Request
Related items