Font Size: a A A

Methodothology And Empirical Research On Domain Ontology

Posted on:2014-09-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:F YuFull Text:PDF
GTID:1318330398454854Subject:Information Science
Abstract/Summary:PDF Full Text Request
Knowledge is one of the important factors of fast development of economy in the world, and is the source of human civilization heritage and development. As the world enters the era of knowledge economy, knowledge has become recyclable and lucrative resource which promotes country and society development. The Internet provides a broad platform for the dissemination of information, but creates obstacles for the majority of users to search for accurate information. Networks are rapidly spread to every corner of the world, but how to rapidly and exactly find the information is a common problem we have to deal with. Retrieval efficiency is going up with retrieval technology development; still retrieval results cannot reach the expectations of users. Nowadays people focus on effectively knowledge organization, ontology is one of the most effective methods. Since ontology was applied to information science and artificial intelligence, it has played an impressive role on knowledge organization. With the interest of various disciplines researchers, ontology was gradually introduced to medical science, military science, geography and agriculture science. Ontology theory, methodology and application research have become plentiful improvement after being more10years trial. However, ontology construction methods diversity and areas of distinction make it difficult for ontology reuse and sharing. As long as ontology construction methods are specificated, we can ensure the smooth progress of the ontology building process and the realization of large-scale ontology construction. Comparison and making movement of ontology construction methods can improve the efficiency of those methods, maximize the advantage of knowledge organization, and provide strong guarantee for knowledge storage, analysis and retrieval.The dissertation is divided ontology construction into concept exaction, relation exaction and formalization. The main theory is ontology construction abstract method. On the base of comprehensive utilization of data sources, which are thesaurus and documents, the dissertation proposes a concept exaction method based on combine word rule and N-gram algorithm, a concept filter method based on expand mutual information and context information, a core word exaction method based on weighting algorithm and information entropy, a hierarchical relation exaction method based on space vector similarity, a non hierarchical relation exaction method based on syntax rule and expand apriori algorithm, and formalization based on Jena. The dissertation takes geomatics as an example to construction geomatics ontology by using methods above and to prove the feasibility of those methods. The originality of the dissertation is making a movement of the concept exaction method based on linguistics and statistics, also a comparison between character and word similarity calculation.The dissertation contains seven chapters. Except Introduction and Conclustion, other parts of the dissertation are divided into three parts.The first part (Chapter1) discusses ontology and related theory.The content of ontology in information science is defined. The features of ontology in knowledge description and sharing are discussed too. Nine kinds of ontology and five basic elements are listed in this chapter. The rules and tags of XM1, RDF and OWL and their relationships are discussed. The ontology construction abstract methods, such as IDEF5, TOVE, Skeleton, METHONTOLOGY, and the ontology construction specific methods, such as apriori algorithm, N-gram algorithm, mutual information, information entropy, similarity calculation, are despicted and evaluated. The advantages and disadvantages of ontology construction tools, such as Protege, Jena, are listed.The second part (Chapter2to4) discusses and makes experiments about concept exaction, relation exaction and formalization respectively.Chapter2converts words format from documents to database via matching and storage of string function and relation table, and mapping by code rule of thesaurus. The dissertation exacts the most commonly used combine word rules via thesaurus word segmentation and POS tagging. The chapter exacts concepts by using combine word rules and N-gram algothim, describes these two algothims, and analyzes the calculation results of them. And the chapter finds out that combining these two algothims can get a better result. After concept exaction, the chapter use context information and mutual information filter to expand2-word mutual information to3-or4-word mutual information. In addition, with the help of information entropy expanding, neighbor word average and weighted algothim to complete concept exaction.Chapter3use relation table to structure convert hierarchical relation in thesaurus. On the base of hierarchical relation, with the help of neighbor word filter algothim and comparison of results base on chacter and word, similarity threshold can be divided into sub level average similarity, super-sub level average similarity and super level average similarity. Then concept can be leveled according to the threshold. Otherwise, non hierarchical relation also needs to put in relation table. According to specific Chinese syntax rule, subject, predicate and object can be exacted. And apriori algorithm is used to select triples which are made of S, P, and O. In the end, all triples of ontology are prepared.Chapter4deeply discusses the relationships of ontology, OWL and semantic and proposes that semantic data are those which can release users'burden as well as realize auto analysis. The chapter analyzes a method to choose ontology language and manual and automatic formalization methods. In the end, the chapter takes geomatics as an example to formalize the ontology.The third part (Chapter5) constructs ontology building system, and proposes the detailed needs of word segmentation, concept exaction, relation exaction and formalization in the system. The chapter deeply designs the overall and details of the system. Overall design contains concept exaction module, concept filter module, and hierarchical relation exaction module, non hierarchical relation exaction module and formalization module. Detailed design describes system interface and functions of every module.The dissertation is one of research achievements of major program of national social science foundation of China:"Semantic-based Deep Integration and Visualization of Library Resources"(11&ZD152).
Keywords/Search Tags:Ontology, Domain ontology, Ontology construction method, Semantics, N-gram, Mutual information, Apriori Algorithm
PDF Full Text Request
Related items