Font Size: a A A

Semi-supervised Domain Ontology Building Based On Text

Posted on:2011-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:J H WangFull Text:PDF
GTID:2178360305454760Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The ontology concept derived from philosophy. In recent decades, with the development of artificial intelligence, ontology has been introduced into information systems, knowledge management and other fields, and has become an essential component of the computer professional language. Ontology provides the basis for the further intelligence of human-computer interaction. It can be divided into these categories: general ontology, domain ontology and application ontology.More and more computer fields utilize ontology, such as computer search. Commonly used search engines are based only on a few isolated words; so many results of them are inconsistent semantic records, which lead to great problems in the results sorting.The user must find the pages which he need all by himself. A search engine which is based on ontology, can give the query results with ontology semantics, with the semantics of query, results can be annotated, sorted and filtered, and the accuracy of query results can be improved.How to build ontology quickly and efficiently becomes an urgent problem. Building ontology manually is too costly, because the operation is extremely troublesome, time-consuming and labor-intensive. Ontology building has become the bottleneck of its applications, so automatic and semi-automatic ontology construction methods are proposed by researchers.The research status of ontology construction is summarized as the following:1) Ontology learning outcomes are not highly accurate, especially the results of relationships between concepts.2) Only classification relation can be studied generally.3) The existing learning systems are mostly prototype systems, which can not be applied directly in practice.Many data mining and machine learning techniques are applied to ontology learning, such as natural language processing, statistics, pattern matching, association rule mining, hierarchical clustering, flat divide clustering, formal concept analysis and Support Vector Machines(SVM), etc.A few representative ontology learning methods are Ontolearn method, Kietz method, Alfonseca method, Faatz method, Aguirre method, Hearst method. A few representative ontology learning tools are Text2onto, Hasti, OntoLearn and OntoBuilder. Among them, Text2onto is used most.Text2Onto is a text-based ontology learning framework, which has three main differences with its previous versions or other ontology learning frameworks. Firstly, it is based on a specific target language, and uses the probability ontology models to study the knowledge element instances. Secondly, the user interaction is a central aspect of the system; whith calculating the credibility of each learning object, the system provides the user with a more accurate view of the probability ontology model. Thirdly, Text2Onto is based on the data-driven change discovery strategy. When a change occurs, the system avoids dealing with the entire corpus, and only updates the POM selectively. Users can also track the ontology changes.Based on the analysis and summary of the current results, this paper presents a text-based domain ontology prototype building method. The main idea is that:1) Use text pre-processing technology, statistics and noun phrases pattern to learn terminology.2) Improve SSI algorithm to do semantic disambiguation of terminology with active learning.3) Study the classification relations between concepts with the binary associations, pattern matching and WordNet.4) Use semantic relationship learning algorithm with NNV ternary association to learn semantic relations of concepts.5) Use the attributes learning algorithms with NNN triple associations and the classification relations to access the attribute sets of concept.The concept learning idea is as the following:1) Do lexical analysis and semantic annotation based on pre-processing tool Gate.2) Extract terminology candidates, calculate its TF value, and filter candidate terminology with TF value and the stop word list.3) Define noun phrase patterns, extract candidate noun phrases, calculate TF values, calculate mutual information and context dependent of the candidate noun phrases, and filter candidate noun phrase sets.4) Generate the text vector matrix and the term context matrix.5) Use the active learning-based semantics disambiguation algorithm to learn concepts.The category relational learning idea is as the following:1) Use Apriori algorithm to generate binary association of concept set from corpus.2) Use the string matching method to study the relationship between classification, filter synonymous relations and anti-sense relations with WordNet.3) Use the pattern-matching algorithm to learn classifier relations with the category relation patterns.The semantic relation learning idea is as the following:1) Extract verbs from simple speech tagging texts.2) Use Apriori algorithm to generate NNV triple correlation set with concept binary association set in which category relations, synonymous relationships and part-of-relationships are filtered, and generate semantic relations.The concept property learning idea is as the following:1) Using pattern matching to learning concept properties.2) Learn concept properties with WordNet.3) Learn concept properties from NNN triple associations which are generated by Apriori algorithm.Main work of the article is that:1) Noun phrases terminology extraction.2) Propose a semantic disambiguation algorithm with active learning by recuperating SSI algorithms.3) Propose a semantic relation learning algorithm with NNV triple association.4) Propose an attribute learning algorithm with pattern Matching and NNV triple association.Some aspects of the paper remain to be improved:1) The results of concept learning, concept relations learning and concept property learning need improving, and the correlation algorithms need optimizing.2) Consistency check function and axiom learning need to be added.
Keywords/Search Tags:Ontology, Ontology building, domain Ontology, Ontology learning
PDF Full Text Request
Related items