Font Size: a A A

Research And Implementation On Focused Crawler With Search Strategy

Posted on:2019-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:L TianFull Text:PDF
GTID:2348330545458506Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,vertical search engine play a major important role in people's life,it provides more specialized services of web search and information retrieval.Focused crawler is the key to vertical search engines,crawler's search strategy directly influence the quality of search results.Existing focused crawlers still have space for improvement in coverage,efficiency,and the accuracy of topic relevance judgment.In order to improve the search coverage and information synchronization of focused crawler,this thesis designed a search strategy of focused crawler,the main research contents are as follows.The information of the Internet is dynamic,it is difficult to completely guarantee the synchronization between the focused crawler and the Internet information.In order to achieve the best search effect,this thesis presents a more reasonable assessment model of web crawling priority.The assessment model measures the importance of a page from three dimensions:topic relevance,link analysis and update frequency.Model makes focused crawler download and update the more important pages,as far as possible to ensure synchronize with the dynamic Internet information.In order to improve the focused crawler's search coverage and the accuracy of topic relevance judgment,this thesis designed a more specialized search strategy of focused crawler.This strategy has added the Web page structure classification and the text extraction,making the top relevance judgment more pertinence.At the same time,set search depth for irrelevant pages on topics,enhance the search coverage of top crawler.Set Web page fetch interval and fetch priority,it makes focused crawler work more efficient.This strategy makes focused crawler more comprehensive by detecting near-duplicates and cheating.In order to improve the search efficiency and scalability of focused crawler,this thesis achieve the focused crawler search strategy by Hadoop and HBase.
Keywords/Search Tags:Vertical search engine, Focused crawler, Priority assessment, Search strategy
PDF Full Text Request
Related items