Font Size: a A A

Research On Theme Reptiles Based On Educational Information Resource Ontology

Posted on:2015-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:H ChenFull Text:PDF
GTID:2208330452452286Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of cyber source, the traditional keyword matchingsearch engines cannot satisfy the user professional, personalized query requests. Atthe same time, due to the impact of the World Wide Web architecture, the traditionalweb crawler based on keyword matching greatly reduce the Webpage grab the recalland precision. How to acquire the professional, personalized data from cyber sourcehas become the main research direction for the research institutions and scholars.Facing with such a dilemma, topic crawler appeared. The topic crawler is a webcrawler for cyber source acquisition by a predetermined set of topics. Based on thestudy of relevant theories of topic crawler, the paper makes full use of the advantageof ontology in the semantic expression, and proposes a model of topic crawler basedon ontology.Firstly, the paper has constructed the education information resource ontologyand expanded its particular attributes, which describes the specific topic of topiccrawler. Secondly, by analyzing the structure of large Webpage, we employ relevantalgorithm analyze the title text, URL text, anchor text and other information to extractthe links concepts sets; and employ relevant algorithm the title text, the pagedescription and keywords to extract the page content concepts sets. Thirdly, the paperproposes the link correlation algorithm based on domain ontology and the contentcorrelation algorithm based on domain ontology. During the process of crawlingwebpage, we analyze the similarity between the links concepts and the domainontology concepts to filter irrelevant URL links; and we analyze the similaritybetween the page content concepts and the domain ontology concepts to filterirrelevant webpage. Finally, the paper implements the topic crawler prototype systembased on educational information resources ontology, which verifies that the crawlersystem improves the recall and precision of Webpage collection.
Keywords/Search Tags:domain ontology, concept, topic crawler, semantic similarity, themecorrelation
PDF Full Text Request
Related items