Font Size: a A A

Research On Semantic Crawler Algorithm And System Realization Based On Ontology

Posted on:2011-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:C DongFull Text:PDF
GTID:2178360305954097Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Semantic web is the extension of World Wide Web. Document in semantic web contains semantic information which helps to analyse and process data from a different aspect. Semantic web is developing rapid during these years, and quantity of semantic document is growing as well. How to process and parse information is a way of improving service quality. Semantic search engine is such an application which takes advantage of semantic information to manage documents and return result to user's query. Semantic vertical search engine filter theme related document away from others and provides accurate semantic information, which is been processed and indexed, of a certain theme to user's search.Semantic focused crawler is an important part in whole structre of semantic search engine. It takes responsibility of retrival semantic resources from web. It also processes resource, makes a classification, metadate extraction and store. An issue of semantic focused crawler is how to retrival related documents efficiently in large quantity of web pages just as finding a needle in haystack. Semantic focused crawler need to judge document content, compute similarity between theme and document, filter related result away from unrelated and store them into database. As well as function, efficiency of focused crawler is also an important part in semantic focused crawler, since it is an online application.As a result, according to semantic crawler's accurate and efficiency, solutions have been proposed in this paper. To measure semantic document similarity with theme is brought up in this paper to solve the problem of document content judging. The method presents domain ontology and document as structure of graph and measure similarity between them. As refers to efficiency of semantic crawler, this paper proposed an algorithm which mix Q-learning with Bayes classcifier (we named it QBLP for short), QBLP takes similarity of document as the input of Q-learning's repay function and this function adjust prior probability and conditional probability of Bayes classifier which is accurated during the process to enhance efficiency of resource retrival.Creative and improving exploration has been made as following.(1). Cluster based on max probabilistic density cluster. We proposed cluster method based on max probabilistic to eliminate semantic various in representation of semantic document. Max probabilistic density is a similarity standerd the probability of mapping a keyword to one certain concept. Cluster is formed after clustering based on max probabilistic and concepts in cluster is connect with each other as a structure of graphy to presents semantic document.(2). Dynamic path adaption and prediction is also been researched in this paper. Semantic document is scuttered all apart. So analysis of content and prediction of crawling path is crucial to focused crawler. As state above, we present document as graph. Further more, we use the result as an input of Q-Learning module and proposed QBLP as the crawler path prediction algorithm. Q-learning provides Bayes classifier accurate knowledge to precede prediction. Through accumulation of feature of both document and link, Q-Learning adjust crawler's path to advance efiiciency of crawler's performance. This function is been test through a series of experiment.This paper is under background of semantic search engine. It is mainly aim at collect theme related semantic resource and enable search engine to be queried. Searching and filtering of semantic document make a great sense of information integrated and intelligence search. Experiments is proceed to test algorithm and crawler system, and the result shows that it could efficient crawl useful resource to provide solid data fundamental to search engine.
Keywords/Search Tags:Semantic Web, focused crawler, Ontology
PDF Full Text Request
Related items