Font Size: a A A

A New Annotate Ontology Method Based On Bootstrapping

Posted on:2011-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q GaoFull Text:PDF
GTID:2178360308459057Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Along with the development of internet, the Web resources have been growing greatly, however the automation of information processing on internet is in a low state, and the relevance between information is poor, even with the aid of powerful search engine, the effective information from the web resources can not been obtained accurately and quickly. In order to solve such problems, Tim Berners-Lee, the founder of Web, proposed the concept of semantic Web that added an extended level based on existing Web and described the Web information in a formalized way on extended level. Therefore to tag Web resources by ontology that could enhance Web resource state from machine readable to machine understandable, as a basis, to develop semantic Web is an efficient solution to obtain Web information.Most existing labeling methods are in a low-level automation, lacking of adaptability and efficiency. This paper studies labeling method to ontology systematically and proposes a new method to label ontology based on Bootstrapping. Firstly given ontology is analyzed and rules files are created, then domain files are sort out by text categorization. Afterwards a method of Bootstrapping for information annotation and extraction, as well as ontology reasoning is introduced. After several loops, good labeling effect can be achieved just by using a few training texts. The experiment results proved that this method has high accuracy in recognizing entity and good effect to label. The main achievements involve the following aspects:①A new automatic text categorization learning algorithm based on Bootstrapping and Bayes algorithm. Due to diversity and complexity of undermined labeling texts, if the information is labeled and extracted directly, the work is huge and the labeling error rate is high. Therefore, before labeling, text categorization is needed to carry on in advance in order to extract the related documents to domain ontology. In order that classifier can categorize and label texts accurately in the case of small number of samples, a new automatic text categorization learning algorithm based on Bootstrapping and Bayes algorithm is presented in this paper. The algorithm only needs a few training texts as seed-set to train classifier and to select partial texts with the highest confidence degree from the results of categorization and to put them into seed-set as a new training sample for next round, and to repeat train until the end. Thus, the results by training massive training texts are achieved through the few amounts of training texts. ②Text set is labeled by means of Bootstrapping and rules. According to rules files, text set is labeled initially and by using the context relations between individuals and using WHISK algorithm as a reference to conclude extraction rules, new rules files are generated and new vocabularies are labeled. Then, the information extracted and labeled is filled in ontology files. By means of Ontology inference engine, we can reason ontology files, eliminate error data and trim wrong rules, thus the model can achieve to extract new individual automatically through multiple iterations for the aim to rich and perfect ontology. The text set is labeled after iteration finished.③The ontology labeling method presented in this paper combines categorizing and labeling of domain text as a whole model. The expanded ontology database after the model iterates each time makes classifier continue to carry on and the unlabeled domain texts caused by categorization extend ontology database further. Over and over, we can effectivly achieve the goal of using small sample training set to label ontology. The experimental results demonstrates that this method has achieved good categorization effect and high recall rate and accuracy on ontology labeling.
Keywords/Search Tags:BootStrapping, Rule, Ontology, Annotation
PDF Full Text Request
Related items