Font Size: a A A

Research On The Topical Crawler For The Cultural Fields

Posted on:2017-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q W QiuFull Text:PDF
GTID:2348330566456654Subject:Control engineering
Abstract/Summary:PDF Full Text Request
In Internet era,a large number of web pages have been produced repidly generated.The search results of the traditional search engine often have noise pages so that it cannot precisely return the users want.Therefore,the vertical search engine aiming at precise search for a given topic has been developing rapidly.The topical crawler,the core of vertical search engine,has become a research hotspot.Topical crawler not only has the functions of traditional crawler: capturing,analyzing and storing webs,but also can determine whether the page content is related to the given topics,even can predict the hyperlinks of the web pages realted to the given topics.This thesis mainly works on the topical crawler for the cultural field.The basic idea contains the following steps: combine the topical crawler and Ontology in semantic web;use Ontology of the field of culture for updating the original feature vector of web page;identify similarity and relationship of keywords for improving the precision,intelligent,and semantic of the topical crawler.The research work is organized as follows:After investigating the state of the art of the topic crawler research,it shows that the existing algorithms of topical crawler have known open issues.Although the algorithms based on keywords Vector and classifier are simple,they can neither identify synonyms,nor ignore the semantic association between keywords.In this case,the accuracy is poor.The semantic related topical crawler algorithms can effectively improve the intelligence and accuracy,however,the algorithms are very complex and difficult to achieve.In order to build the suitable cultural Ontology,the theory and applications of crawler and Ontology are studied,and the existing building methods are analyzed.The concepts of cultural field and the experts' advice for concluding a suitable building method for the cultural domain Ontology are analyzed,and then a cultural domain Ontology is built by using the method with the tool Protege.After studying the existing algorithms,the cultural domain Ontology for topic description is introduced.In this case,the page feature words can be updated.The basic principle is “outline comes first and details after”.Meanwhile,the authority degree in order to distinguish the links related to the degree of transitivity is introduced.In this way,the page correlation algorithm and link correlation algorithm based on the cultural domain Ontology can be completely built.Then Experiments are designed for validating the proposed algorithm.The topical crawler system based on the Ontology of cultural domain and its specific function modules has been designed.Then relevant experiments are carried out.Experimental results show that the topical crawler proposed by this thesis based on ontology of culture has better semantic and comparatively accuracyte for description of cultural themes.The method proposed in the thesis has better precision when compared with the traditional topical crawler.The crawling rate is acceptable.In conclusion,the topical crawler proposerd by this thesis has benefits in theory and in practical applications.
Keywords/Search Tags:topical crawler, culture, ontology, relevance, precision ratio
PDF Full Text Request
Related items