Research On Semi-automatic Ontology Construction Technology For Minority Language Domains

Posted on:2019-03-30

Degree:Master

Type:Thesis

Country:China

Candidate:D L T A B D R H M Ku

Full Text:PDF

GTID:2428330566967034

Subject:Software engineering

Abstract/Summary:

Ontology is a conceptual model that can describe domain-related issues at the semantic and knowledge level,and is an important means for solving domain knowledge sharing and reuse.The ontology,as a high-level organizational structure of data,plays an important role in the fields of Knowledge Engineering,Digital Library,Information Retrieval,and Semantic Web.At present,in the research of ontology,there is no accurate method for how to construct an ontology.In order to standardize construction method of ontology,this paper study the research on semi-automatic domain ontology construction methods and propose a semi-automatic ontology construction technology for minority language domains,using Uyghur language as an example.The study of the construction of ontology in some minority languages,such as Uyghur,are relatively less and backward compared to a wide range of languages such as English and Chinese.The lack of external resources in the small language helps to construct the domain ontology,such as a more complete virtual word list,vocabulary of synonyms,parasynonyms and so on,or the resources similar to Wordnet that can provide semantic structures.Therefore,focusing on the lack of external auxiliary resources in small languages,this study propose a method for constructing the Uyghur domain ontology based on cross-language weight and domain text concepts.This provides great convenience for the construction of Uyghur domain ontology.The semi-automatic construction method of the ontology domain proposed in this paper is mainly divided into two parts.The first is to build a standard domain ontology,which includes collecting cross-language reuse ontology,automatic extraction of triples,English and Uyghur matching.Then the Uyghur tri-gram library is optimized by modifing and adjusting.At last,the Uyghur language standard domain ontology is constructed through Apache Jena tool.The second is the use of a method for expanding domain ontology from a Uyghur domain text corpus.This method first collects and prepares a corpus and performs preprocessing such as deleting stop words and stemming.Then the domain core vocabulary from the domain text is extracted using the TFIDF.The training of word vector model for mixed corpora in 14 domains uses Google's open source word2 vec tool.The words that are similar to the core lexical semantics is extracted from the model to build extended domain vocabularies.Accurate semantic similar vocabularies are screened by domain experts' verification and judgment.These terms are inserted into standard domain ontology by Protégé tool to realize the expansion of conceptual words,attributes and entities in the standard ontology.Finally,a syntax verification is conducted on the constructed ontology using the Protégé tool.The analysis of the experimental results show that the method in this paper is effective,and it also verifies the feasibility of the semi-automatic ontology construction technology in the Uyghur domain.

Keywords/Search Tags:

Domain ontology, Ontology reuse, Cross-language, Word2vec

Related items

1	The Resarch On The Approaches Of Ontology Construction Based On Thsse Cross-language Ontology Reuse
2	Research On Sub-Ontology Model For Large-Scale Ontology Reuse
3	Modular Construction And Reuse Of Domain Ontology
4	Study On The Theory And Practice Of Ontology And Ontology-based Agricultural Document Retrieval System--Floricultural Ontology Modeling
5	Research On Domain Ontology Representation, Reasoning And Integration For The Semantic Web And The Applications
6	Research On Web Cross Language Information Retrieval Based On Ontology
7	An Approach For Measuring And Comparing Structural Semantics Of Ontologies Based On Graph Derivation
8	Design Of Reuse-oriented Education Resource Extended Service Model And Construction Of Ontology Repository
9	Data Fusion Platform Based On Cross-Domain Ontology Linkage
10	A Domain Ontology-based Program To Understand The Methods Of Research