Font Size: a A A

Study On The Methods Of Discipline Terminology Ontology Learning Based On Chinese Unstructured Text From Digital Library Domain

Posted on:2016-02-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:H ZhuFull Text:PDF
GTID:1368330461958380Subject:Information Science
Abstract/Summary:PDF Full Text Request
Compared with the World Wide Web(WWW),the Semantic Web(SWeb)is a kind of intelligent network.SWeb can give the semantic description of the information resources in it.It can give not only the vocabulary and concept but also the logical relationship between them.SWeb can enable computer to understand the meaning of information resources better,and it also make the communication between people and computer more efficient and valuable.The mechanism of ontology is the core technology to realize SWeb as a way to describe and organize the knowledge,it has four main characteristics:conceptualization,formalization,clarity and sharing.The ontology layer is the layer 4 in the 7 layers architecture of SWeb.It describes and organizes the information resources according to the corresponding semantics,and it is the basis to exchange and share the information resources.The semantic description and organization of the information resources depend on the construction of the ontology of corresponding domain.The early construction of domain ontology is mainly achieved by the manual work of ontology engineers and domain experts,and this construction way has the following disadvantages:(1)it costs a lot of time and manpower;(2)it can be influenced by the subjective factors of domain experts.In order to solve these problems,the ontology learning has been proposed in academia by using data mining,machine learning,and mathematical statistical methods and techniques,etc.Then ontology elements can be found in the existing data resources in an automatic or semi-automatic way by using computers,and the specific elements are concept,instance,taxonomic relation,non-taxonomic relation and axiom.The domain ontology learning basing on unstructured text is the hot subject and research frontier in the fields of computer science and information science.Meanwhile,the unstructured text in Chinese has different requirements on the ontology learning method and technique due to its own characteristics.The present state of the research of Chinese ontology learning through the literature study is given as following:(1)focusing on the theoretical assumption and method argument;(2)the discussion on the framework and process for the ontology learning seems more,but there is no a specific and available ontology learning system in Chinese;(3)the immaturity of Chinese natural language processing technology also has great influence on Chinese ontology learning;(4)the research of non-taxonomic relation on ontology concept is less.In the mentioned cases,Chinese ontology learning methods and techniques are discussed in this paper basing on the unstructured text in the digital library field.First of all,the basic concept and theory of ontology will be discussed;then the Chinese domain ontology learning model will be built,and the domain ontology can be obtained by using the data mining and mathematical statistics method and technology.The elements of ontology are:the domain concept,the taxonomic and non-taxonomic relation of concept.Finally,the constructed domain ontology will be described,stored and visual displayed.The main work of this paper includes:(1)The model of Chinese domain ontology learning system is constructed basing on the technology integration.The in-depth discussion is given on the functional components and learning process of the ontology learning on the basis of the literature research,system analysis and application model.A knowledge service-oriented Chinese domain ontology learning system model is constructed on the overall goal of providing knowledge service by integrating a variety of data mining techniques and mathematical statistics methods.Furthermore,the concrete implementation plan of the key components in the model is also given and demonstrated.(2)The terminology in Chinese domain ontology and the predicate verb as the term tag of non-taxonomic relation can be automatically identified.In the specific realizing process,the domain terms and predicate verbs in the unconstructed literature of the subject areas can be extracted by using Chinese word segmentation,mathematical statistics and weight calculation method,etc.(3)The automatic extraction model of the term taxonomic relation is established,which is for"digital library" subject area and with practical value.The model also realizes the extraction of the taxonomic relation of domain ontology terms(relationship).First,based on the unstructured domain document,the vector space model of term is constructed.Then,the taxonomic hierarchical relationship between domain terms is excavated by using BIRCH preliminary clustering and hierarchical clustering,and the class labels can be determined by using the term comprehensive similarity index.(4)The automatic extraction model of the term non-taxonomic relation is established,which is for "digital library" subject area and with practical value.The model also realizes the extraction of the non-taxonomic relation of domain ontology terms.First,based on the unstructured domain document,the vector space model of sentence-term is constructed,and a pair of terms with non-taxonomic relation can be obtained by using association rule mining method.Then,based on sentence-<term,verb>vector space model,non-taxonomic relation of terms is excavated by using association rule again,and the label is assigned for non-taxonomic relation.(5)The constructed "digital library" subject area is described and stored by using ontology web language(OWL).OWL described the concept of ontology as class.The relationship between the ontology concepts is described by the property of OWL.Based on the relational database,the subject area ontology is stored,and the relational database is suitable for the storage of large ontology data.(6)OntoGraf visually displayed the ontology by using the visual component of the ontology editing tool Protege5.0beta.The domain ontology displayed visually allows the users to have a more intuitive image on the relationship between the concepts of ontology,and it can also discover new domain knowledge.The research significance of this paper are to offer the methods and techniques on how to obtain domain ontology elements from Chinese unstructured text,and the methods and techniques on how to describe,store and visualize the domain ontology.
Keywords/Search Tags:Domain ontology, Ontology learning, Ontology construction, Taxonomic relation, Non-taxonomic relation, Data mining, Clustering, Association rules, Visualization
PDF Full Text Request
Related items