Font Size: a A A

Research On Focused Crawler Technology Based On Domain Ontology And Multi-objective Ant Colony Optimization Algorithm

Posted on:2021-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y DongFull Text:PDF
GTID:2428330647452829Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the increasing size of network resources and the rapid growth of information update speed,personalized search adapted to specific fields and specific needs urgently needs the support of focused crawler technology.In order to improve the search quality of focused crawlers,this thesis mainly studies the construction method of topic models,the calculation method of topic relevance and the search strategy of focused crawlers.We construct domain ontology semi-automatically through ontology learning technology,and introduce multiobjective ant colony optimization algorithm(MOACO)to improve the search performance of focused crawlers.The detailed research contents and methods are as follows:(1)For the construction of the topic model,a domain ontology construction method based on ontology learning technology is proposed.First,the formal concept analysis(FCA)method is used to obtain the class and the superordinate relationship and subordinate relationship of the class from the literature resources to construct the skeleton of the ontology.Then,the Latent Dirichlet Allocation(LDA)is used to mine topic-related concepts from network resources,and the Apriori algorithm is introduced into the generated topic set to mine the relationship between concepts to enrich the hierarchy of the ontology skeleton.Finally,manually adjust the ontology to generate domain ontology.In this thesis,the typhoon domain ontology,rainstorm domain ontology and cold wave ontology are constructed according to the proposed method.Through Protégé software,the visualization of domain ontology is realized.(2)For the topic relevance calculation,based on the domain ontology topic model,the concept semantic similarity、web text topic relevance and hyperlink topic relevance calculation methods are given.With the theme of "typhoon","rainstorm" and "cold wave" respectively,the domain ontology based on ontology learning technology and the domain ontology based on FCA method were compared and tested.The experimental results verify the feasibility and effectiveness of the domain ontology construction method based on ontology learning technology proposed in this thesis.(3)For the focused crawler search strategy,a focused crawler technology(FC_OMOACO)based on domain ontology and multi-object ant colony optimization algorithm is proposed.This thesis considers the link structure and web page text content to build a multi-objective optimization model,selects a group of pareto optimal links according to the fast non-dominated sorting method and the nearest farthest candidate solution(NFCS),optimizes the diversity of hyperlink selection,and guides the crawler search direction.The ant colony algorithm is introduced into the focused crawler technology,and the heuristic search and positive feedback mechanism are used to improve the crawler's global search ability,and try to avoid the search behavior falling into the dilemma of local optimum.Finally,with the theme of "typhoon disaster","rainstorm disaster" and "cold wave disaster" respectively,FC_OMOACO and four other focused crawler methods in the literature are compared and tested.The results show that the focused crawler technology proposed in this thesis is a more effective crawler method.
Keywords/Search Tags:Focused crawler, Ontology learning, Multi-objective optimization, Multi-objective ant colony algorithm, Domain ontology
PDF Full Text Request
Related items