Font Size: a A A

Design And Realization Of A System For Gathering Web Ontologies Based On Focused Crawler Technique

Posted on:2013-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:X FuFull Text:PDF
GTID:2248330371983232Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the fast develop of ontology research, Efficient ontology reuse is commonlyacknowledged to positively influence the large scale dissemination of ontologies andontology-driven technologies. Many professors have contribute to the event. Theprocess of ontology reuse comes along with additional effort, which may easilyoutweigh the expected benefits. Ontologies are a group of notions and their propertiesand their relations, many times ontology is used in some field, it can be used to reasonand reuse the domain knowledge and so on. And in some field some concepts requiresunified description,but now a lot of researchers have define their own ontology, thusthere are a lot of difference and there are a lot of difficult when we share and reuse theknowledge. thus successful reuse some special domain ontology is a very importantcontent in dealing with the problem. There are more and more ontology documents onthe web and in the online database.The core of the information retrieval system is a crawler program and a ontologyconstructor, In the crawler besides analyze pages of information, but also to make thenecessary connection information page analysis in order to obtain the best and mostcomplete relevant areas of web pages document. Information retrieval is a sub-domainof computer science, its goal is to find all files related to the given file to collect userqueries. In the ontology constructor module, we apply the ontology tool PROMPTSuite and the ontology method OntoClean. In this paper,not only do we introduce aontology collection system which can help to reuse the existing ontology and applythese ontologies, but also design a focused crawler-based system which aimed tocollect the ontology around web. Our destination is to service for reusing the existingontologies successful.We chose the focused crawler as search method and we chose the VSM(vectorspace module) as testing method that can estimate the similarity between the sourceand the theme. Our system contains:the information collect module, analysis pagemodule, similarity estimate module, URL extract module, URL similarity estimatemodule, and a initial seed module. In this paper, we design and realize a crawler thatcan collect the similar ontology around the web. First of all in the crawler we must assess the URL that will be visited in the future. There are the URL and it’s fatherURL in the URL queue, and the crawler will download the larger priority URL. Thereare a lot of URL in the ontology documents which need to be handle. We regard theprogress as ontology handling, download all the ontology documents, and put themtogether. Secondly, we researched the VSM method so I have master more knowledgeabout text to vector, and I have chose it as the URL assess method. Besides weresearched the ontology constructor, and divided the whole process into five step:ontology search, ontology rank, ontology segmentation, ontology map and merge,ontology assess. At last we design and realize the ontologyCollection system that needsome seed URL we chose by ourselves. ontologyCollection is a part of ontology reuseand it service for some specific theme.
Keywords/Search Tags:Ontology reused, VSM, focused crawler, assess URL
PDF Full Text Request
Related items