Font Size: a A A

Research And Implementation Of Topic Crawler Based On Domain Ontology

Posted on:2011-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:B X LinFull Text:PDF
GTID:2178360305960950Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the information explosion on Web, the traditional keyword-based search engines cannot meet users'demand for recall ratio and precision any more. At the same time, following the technology development, people's need for intellectualization and professionalization has also increased. Then how to create a more intelligent and professional search engines becomes a real challenge.Vertical search engine is generated to fulfill the professional needs of users. It crawls specific area on web pages through the topic crawler, and saves the pages to form a web database, which is used by Vertical search. While for intellectualization, researchers found that ontology application in information retrieval can be a good choice to meet this demand. Ontology is an advanced knowledge representation technique. It shows good concept structure and excellent abilities in logical reasoning supporting, semantic generalization of relationships among concepts and providing knowledge base for the semantic search. Therefore, the application of ontology technology in the core of vertical search turns into a hot spot in information searching field.This thesis first presents a thorough review of research on topic crawler both at home and abroad, and then shows the studies and implementation of a topic crawler on the basis of domain-ontology technology. The main reviews and works presented in this thesis are as follows:(1) Introduced the frame of topic crawler on the basis of domain-ontology and related modules.(2) Based upon domain-ontology, proposed a contextual topic description method which is used to instruct topic crawler.(3) Improved the arithmetic of concept-semantic similarity and concept-semantic relevance, and given their comprehensive method to count the vector of concepts in the concept hierarchy tree.(4) Proposed the arithmetic of page content and link relevance based on this contextual topic description.(5) Created ontology on educational technique using ontology construction method and the Protege tool.(6) Implement the topic crawler based on educational technique domain using the bot.jar. With the improvements of concept-semantic similarity and concept-semantic relevance in the concept hierarchy tree, our results show that the comprehensive method can better distinguish the relationship between concepts. The proposed method of contextual topic description method based on domain-ontology created in our study can effectively guides the topic crawler and reflects the semantic crawler. The proposed algorithms of page content and link relevance based on domain-ontology can significantly enhance crawling accuracy ratio.
Keywords/Search Tags:Topic Crawler, Domain Ontology, Topic Description, Semantic Similar
PDF Full Text Request
Related items