Font Size: a A A

Research And Implement Of Focused-crawler Relevance Algorithm In Search Engine

Posted on:2017-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:F F WangFull Text:PDF
GTID:2348330536476759Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid popularity of Internet in recent years,increasing number of people use search engine to get Internet information.In the case of the huge web resources becoming more and larger,professional search engine can get the web information about specific area or topic accurately and efficiently.As the key of professional search engine,focused crawler can remove the webs have no relation to topic and make the search results highly relevant to topic.To a great extent,it meets the demand of people for accurate search results.As the result,researching the efficient topic-relevant algorithm to realize professional search engine is significant.The research works as follows:1.After researching on the current situation of focused crawler and relevant technology at home and abroad,this paper analyzes the system structure of focused crawler,introduces the principle and builds method of ontology and discusses the limitations of traditional vector space model based on web keywords and advantages of vector space model based on ontology concepts.2.Facing to the problem that traditional vector space model regards web keywords as independent and neglects their semantic relations,this paper uses the vector space model based on ontology concept to replace the traditional one and realizes the topic-relevant algorithm.3.In the process of building vector space model based on ontology,takes the weight calculation method based on ontology which takes ontology concepts into consideration to get the final weight,and uses SVM as classifier to realize topic-relevance.4.Realizes the focused crawler system on the search engine platform Nutch,and carries on the experiments about topic-relevant algorithm based on traditional vector space model and improved model separately.The results indicate that improved method has certain advantages no matter from the accurate rate to get topic-relevant web or the precision of the search results and verify the feasibility,effectiveness and application value of improved method.This method can be used in different ontologies,and has extensive application value.
Keywords/Search Tags:focused crawler, ontology model, weight calculation, support vector machine
PDF Full Text Request
Related items