Font Size: a A A

Research On UTO Ontology Building And Application

Posted on:2009-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z X JiaFull Text:PDF
GTID:2178360242480627Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Currently there are two ways to taget for the development of the internet, one is the semantic web, and the other is the Web2.0. The goal of the semantic web is to affiliate semantic text to the Web in order to help the computer understand the web knowledge and make the computer process information automatically and intelligently. Ontology is the uniform language and concept which can be used to make knowledge shared possibly in knowledge transform. Web2.0 came out when the social softwares are used widely; it collects the wisdom of the crowd, makes the knowledge on the web popularized. Tag is the key application of the Web2.0 and the new approach to organize the information. Tag realizes the thinking of Folksonomy, which has the merit of self motivation, sharing and dynamic self adaptability. UpperTag Ontology based on Tag was designed by Dr.Ying Ding and her research group who are from (DERI) Digital Enterprise Research Institute, department of Computer Science, University of Innsbruck, Austria. Its aim is to facilitate the retrieve efficiency of the internet resource and speed up the progress of the internet resource sharing. After suming up the problems and the state of the art from the development of the internet, at the same time basing on the research working done by the author during visiting to DERI, the author design and realize an ontology building system based on the Tag and Web2.0. The system can be divided into four models by logical function, there are tag extraction model, data transform model, ontology management model and result analysis model.(1) In tag extraction model, the system analyze and compared most of the mature and open source retrieve case, and then chosed"smart and simple web crawler"as the base class and modulated it according to the specific information organization architecture on Web2.0 website in order to match the necessary condition. The merit of this adjustment is that the system will not care the normal errors such as overtime, dead link, page not found and connection errors anymore during its retrieving process.(2) UTO ontology contains 8 concepts (tagger, comment, vote, date, tag, relative tag, object and source). In data transform model the system gets all the 8 valid concepts from the tag extraction model, and stores them into RDF document(3) In ontology management model, there are many standard document that can be used to store the data, the system chooses RDF document which recommended by W3C refer to UTO ontology format. The RDF Schema on UTO ontology was designed by Vanessa Siegel who is a master student from department of Computer Science, University of Innsbruck.(4) In result analysis model, the system creats a simple visual graphical interface, utilizes Jena Semantic Web Toolkit developed by HP Lab to process RDF document and the other ontology data. The last part is result analysis model.the system imports the UTO ontology stored in RDF document into Jena, excutes SPARQL Language by getting the key words from user input. When the user input tag, the system output the result about object and vote items that are the webpages related to tags and times voted by all the taggers. From the result the user can get article, picture and video related to the tag content. The value of the vote indicates the degree how users love this resource and the agreement of the tag. When the user input object, the system output the result about tagger and tag items that are the users and tags used to make up the resource. Since the tag can summarize the knowledge better, users can get what he wants accurately. The system can also mine the potential hobby and interesting because the object is owned by more than one user. When the user input tagger, the system output the result of object and tag that are all the webpages and tags maked up by the user. The user can get a whole majority of understanding about the object by read the tags which can reflect the information meaningful. Through clusting the tags, users can find relative content that is good to promote the veracity and efficiency of the knowledge searching.By analyzing the organization architecture of the information on social web, it is found that http://del.icio.us, http://www.flickr.com, http://www.youtube.com have different store format. So the system should include separate main functions such as URL Filter, HTML Parser when it was designed to catch the goal of good expansibility. If the user wants to gain tags from another website, what he should do is just to add three classes, there are NewPageFilter, NewPageParser and NewPageMain. The system can get all the tag informations when it is started from the specific URL whose web page contains tags and crawls on them by Depth-firest. All the web pages that include tag or video should be crawled in theory. The system uses NewPageParser to parse the web page and extracts the tag informations appear on them. There are two lists in the package named Crawler.Model, one is used to store all the URL links which are going to be visited, and the other is used to store all the URL links that were visited in order to avoid visiting repeatively. The system will remove all the visited links before adding the new URL links. Class NewPageFilter that contains some prefixion links defined in advance is used to decide which link will be called, avoiding visiting pages not exist, or irrespective websites. Class RDFStore is in charge of managing RDF document. Since Jena can not read the document when its size is larger than 100M, There is a function validating every RDF document when it is produced, if the document exceeds 100M, it will be subsected into segments.This thesis creativitily combines the Tag technology with ontology, applies ontology thinking belongs to semantic web on Web2.0 website, gets the information resource fleetly and accurately. In addition to, enlarges the searching range. The author selects some typical website to put in practice, accumulates a huge of tag datas, on the basis of these tag datas, investigate the approach to build the ontology, start to buil domain ontology, provide support for semantic research by deeply mines the interest of the user, relative resource. All in all, the combining of these two fields will do well to the development of the internet.
Keywords/Search Tags:Application
PDF Full Text Request
Related items