Font Size: a A A

Designing Focused Crawler Based On Improved Genetic Algorithm

Posted on:2019-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:W YanFull Text:PDF
GTID:2428330590992325Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,how to obtain the required information quickly and accurately from the massive network resources has become a key problem.General search engine provides retrieval services for users by webpage collection and indexing,but this retrieval method based on keyword matching often ignores recognition and matching of users' real query intention.Vertical search engine provides specialized and customized information retrieval services for users in specific fields and backgrounds by narrowing the collection scope,which is a research hotspot in the search field.As the webpage collection module of vertical search engine,focused crawler only keeps topic-related webpages on the search path.This paper focuses on the webpage analysis method and search strategy and discusses how to improve the performance of focused crawler.Best-first search strategy is often applied but easily falls into local optimization.To address this problem,this paper proposes a focused crawler based on improved genetic algorithm.In this method,fitness function takes webpage topic correlation and importance into account to measures the comprehensive value of a webpage.Topic correlation is analyzed by vector space model and topic importance is calculated by improved PageRank algorithm.Selection operation picks out webpages with high fitness value.Crossover operation descends according to the topic importance of links,and mutation operation uses search engines to retrieve composite keywords.Lastly,we implement the focused crawler based on improved genetic algorithm.Compared with existing genetic algorithms,the experimental results show that the search strategy based on improved genetic algorithm can enhance the precision and recall of focused crawler and enlarge the search scope.The crawler is more consistent with users' topic retrieval requirements.
Keywords/Search Tags:focused crawler, vector space model, improved PageRank algorithm, genetic algorithm
PDF Full Text Request
Related items