Font Size: a A A

Research On Automatic Ontology Construction Method Of Petroleum Domain Based On The Text Analysis

Posted on:2016-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:L DuanFull Text:PDF
GTID:2308330461481108Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of digital petroleum domain towards intelligent petroleum domain, more and more different kinds of knowledge base have been applied to petroleum information system. Ontology is a momentous knowledge source to construct knowledge base and intelligent information system, but by now there was no authoritative and standard ontology of petroleum domain for intelligent information system needs. The main reason is that most existing ontology is constructed by manual, it will simply take more time than it’s worth and difficult to maintain.In order to solve these problems, this thesis introduces the text analysis theory into the construction of petroleum domain ontology, through in-depth analysis of the text analysis, automatic ontology construction and other relative technologies, it presents a method of automatic ontology construction of petroleum domain based on the text analysis. Referring to the existing research achievements of ontology construction at home and abroad, combined with the research results of text analysis, it makes the research on constructing corpus, extracting petroleum domain concepts and the building methods of the relationships between ontology concepts in Chinese environment.These main researches are as follows.1. This thesis constructs the corpus for petroleum domain ontology extraction. It refers to Epicentre data model to construct petroleum domain ontology structure and consider this concept from this frame as a seed concept and use it as a key word to crawl the original corpus. Meanwhile, it puts forward a corpus constructive system for petroleum domain ontology which is based on seed concept. This system is based on preprocess original corpus by natural language disposal frame GATE, then it generates the reduced dimension corpus with word characters and named entity annotations.2. This thesis puts forward the method of extracting concepts in petroleum domain based on the mixed-strategy. Linguistics: it summarizes the composition rules of concepts in petroleum domain to extract the concepts. Statistics: firstly this thesis introduces statistical models to obtain the combined words in the text, then it uses the TF-IDF model to obtain the terms in petroleum domain, finally it builds synonyms recognition expression, combines with the synonyms in the terms, and accomplishes the extraction of concepts in petroleum area.3. This thesis researches on the building methods of the relationships between ontology concepts. According to the characteristics of the Chinese expression, it builds the rule base on taxonomical relations between concepts, and automatically expands the rule base according to corpus. It builds a vector space model of ’concept- text’ for the hierarchically clusters the concepts, owing to which to build the relationships between concepts. On the basement of association rules, it proposes a solution by using point mutual information to build non-taxonomical relations between concepts and finally it tackles the problem. Consequently, it uses OWL language to describe the generated relationship to complete the structure of petroleum domain ontology.In the end, the thesis presents and assesses the experimental results based on the research above. According to the assessment of the experiment, it proves the effectiveness and feasibility of every part of the research.
Keywords/Search Tags:Ontology Construction, Text Analysis, Corpus Base, Rule Base
PDF Full Text Request
Related items