Research On Semantic Crawler Algorithm And System Realization Based On Ontology

Posted on:2011-10-28

Degree:Master

Type:Thesis

Country:China

Candidate:C Dong

Full Text:PDF

GTID:2178360305954097

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

Semantic web is the extension of World Wide Web. Document in semantic web contains semantic information which helps to analyse and process data from a different aspect. Semantic web is developing rapid during these years, and quantity of semantic document is growing as well. How to process and parse information is a way of improving service quality. Semantic search engine is such an application which takes advantage of semantic information to manage documents and return result to user's query. Semantic vertical search engine filter theme related document away from others and provides accurate semantic information, which is been processed and indexed, of a certain theme to user's search.Semantic focused crawler is an important part in whole structre of semantic search engine. It takes responsibility of retrival semantic resources from web. It also processes resource, makes a classification, metadate extraction and store. An issue of semantic focused crawler is how to retrival related documents efficiently in large quantity of web pages just as finding a needle in haystack. Semantic focused crawler need to judge document content, compute similarity between theme and document, filter related result away from unrelated and store them into database. As well as function, efficiency of focused crawler is also an important part in semantic focused crawler, since it is an online application.As a result, according to semantic crawler's accurate and efficiency, solutions have been proposed in this paper. To measure semantic document similarity with theme is brought up in this paper to solve the problem of document content judging. The method presents domain ontology and document as structure of graph and measure similarity between them. As refers to efficiency of semantic crawler, this paper proposed an algorithm which mix Q-learning with Bayes classcifier (we named it QBLP for short), QBLP takes similarity of document as the input of Q-learning's repay function and this function adjust prior probability and conditional probability of Bayes classifier which is accurated during the process to enhance efficiency of resource retrival.Creative and improving exploration has been made as following.(1). Cluster based on max probabilistic density cluster. We proposed cluster method based on max probabilistic to eliminate semantic various in representation of semantic document. Max probabilistic density is a similarity standerd the probability of mapping a keyword to one certain concept. Cluster is formed after clustering based on max probabilistic and concepts in cluster is connect with each other as a structure of graphy to presents semantic document.(2). Dynamic path adaption and prediction is also been researched in this paper. Semantic document is scuttered all apart. So analysis of content and prediction of crawling path is crucial to focused crawler. As state above, we present document as graph. Further more, we use the result as an input of Q-Learning module and proposed QBLP as the crawler path prediction algorithm. Q-learning provides Bayes classifier accurate knowledge to precede prediction. Through accumulation of feature of both document and link, Q-Learning adjust crawler's path to advance efiiciency of crawler's performance. This function is been test through a series of experiment.This paper is under background of semantic search engine. It is mainly aim at collect theme related semantic resource and enable search engine to be queried. Searching and filtering of semantic document make a great sense of information integrated and intelligence search. Experiments is proceed to test algorithm and crawler system, and the result shows that it could efficient crawl useful resource to provide solid data fundamental to search engine.

Keywords/Search Tags:

Semantic Web, focused crawler, Ontology

PDF Full Text Request

Related items

1	Research On Semantic Crawler Algorithm And System Realization Based On Ontology
2	Design And Implementation Of Focused Crawler Based On Ontology
3	Design And Realization Of A System For Gathering Web Ontologies Based On Focused Crawler Technique
4	Research And Implement Of Focused-crawler Relevance Algorithm In Search Engine
5	Research On Topic Focused Web Crawler And Related Technologies
6	Research On Search Strategy And Key Techniques Of Focused Crawler
7	Focused Crawler Based On Domain Ontology And Similarity Concept Context Graph
8	The Research On Key Technology Of Semantic Search Engine In Semantic Web
9	Distributed Focused Crawler Based On Improved Tabu Search Strategy
10	Research Of Focused Crawling Strategy