Font Size: a A A

Research On Semi-automatic Ontology Construction Technology For Minority Language Domains

Posted on:2019-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:D L T A B D R H M KuFull Text:PDF
GTID:2428330566967034Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Ontology is a conceptual model that can describe domain-related issues at the semantic and knowledge level,and is an important means for solving domain knowledge sharing and reuse.The ontology,as a high-level organizational structure of data,plays an important role in the fields of Knowledge Engineering,Digital Library,Information Retrieval,and Semantic Web.At present,in the research of ontology,there is no accurate method for how to construct an ontology.In order to standardize construction method of ontology,this paper study the research on semi-automatic domain ontology construction methods and propose a semi-automatic ontology construction technology for minority language domains,using Uyghur language as an example.The study of the construction of ontology in some minority languages,such as Uyghur,are relatively less and backward compared to a wide range of languages such as English and Chinese.The lack of external resources in the small language helps to construct the domain ontology,such as a more complete virtual word list,vocabulary of synonyms,parasynonyms and so on,or the resources similar to Wordnet that can provide semantic structures.Therefore,focusing on the lack of external auxiliary resources in small languages,this study propose a method for constructing the Uyghur domain ontology based on cross-language weight and domain text concepts.This provides great convenience for the construction of Uyghur domain ontology.The semi-automatic construction method of the ontology domain proposed in this paper is mainly divided into two parts.The first is to build a standard domain ontology,which includes collecting cross-language reuse ontology,automatic extraction of triples,English and Uyghur matching.Then the Uyghur tri-gram library is optimized by modifing and adjusting.At last,the Uyghur language standard domain ontology is constructed through Apache Jena tool.The second is the use of a method for expanding domain ontology from a Uyghur domain text corpus.This method first collects and prepares a corpus and performs preprocessing such as deleting stop words and stemming.Then the domain core vocabulary from the domain text is extracted using the TFIDF.The training of word vector model for mixed corpora in 14 domains uses Google's open source word2 vec tool.The words that are similar to the core lexical semantics is extracted from the model to build extended domain vocabularies.Accurate semantic similar vocabularies are screened by domain experts' verification and judgment.These terms are inserted into standard domain ontology by Protégé tool to realize the expansion of conceptual words,attributes and entities in the standard ontology.Finally,a syntax verification is conducted on the constructed ontology using the Protégé tool.The analysis of the experimental results show that the method in this paper is effective,and it also verifies the feasibility of the semi-automatic ontology construction technology in the Uyghur domain.
Keywords/Search Tags:Domain ontology, Ontology reuse, Cross-language, Word2vec
PDF Full Text Request
Related items