Font Size: a A A

Research And Design On The Search Strategy Of Focused Crawler Based On Genetic Algorithm

Posted on:2014-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:R R ZhangFull Text:PDF
GTID:2428330488993183Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
As the amount of information on the Internet growing rapidly,it's possible to obtain vast resources of information from the Internet,however,it's becoming more and more difficult to locate the information you need.And it is becoming increasingly difficult to satisfy the demands of Internet users for traditional search service provided by general-purpose search engine.Under such circumstances,vertical search engine for a special field has become a research hotspot.As the data collection module of vertical search engine,focused crawler is studied in this paper.Focused crawler is one of the most vital components of vertical search engine.The performance of the vertical search engine is influenced by the quality of the focused crawler directly.Both of topic relevance based on the text and importance reflected by link relationships are used to predict whether the link that has not been accessed is relevant to the specific topic or not.Genetic algorithm is also introduced to guide focused crawler to move in the right direction in this paper.Genetic operators,including selecting operator,cross operator and mutation operator,are designed to improve the performance of focused crawler.In addition,structural features of html document are also taken into account when calculating correlation of webpage.Finally,in the Security Data Collector project,an extendable focused crawler oriented to information security field has been designed and implemented.Through this crawler,Best first search algorithm,Shark search algorithm and the search strategy proposed in this paper are compared and analyzed.As the experimental results suggest,the time efficiency of the search strategy proposed in this paper declined a little,but precision ratio increased obviously.Compared with Best first search algorithm and Shark search algorithm,the search strategy proposed in this paper can avoid topic drift problem and falling into local optimal solution prematurely,and the global search performance of focused crawler improves a lot.
Keywords/Search Tags:Topic Crawler, Genetic Algorithm, Search Strategy, Vertical Search Engine
PDF Full Text Request
Related items