Font Size: a A A

Construction And Implementation Of Domain Ontology Based On Plain Text

Posted on:2017-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:R GuoFull Text:PDF
GTID:2348330512455428Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,huge amounts of Web pages contain lots of important information.On the one hand,the domain ontology is an indispensable part of the semantic web,which is extracted from the Web.On the other hand,it can be used for intelligent management of vast knowledge.The extraction of domain ontology can be divided into the following sub-tasks: 1)Extraction of the domain-specific term.2)Extraction of the domain concepts.3)Extraction of the taxonomic relation and the non-taxonomic relation.In the past,the construction of domain ontology depends on domain experts,so the process of ontology construction is time consuming.Especially,with the development of Internet,traditional method cannot manage the knowledge effectively.To reduce these costs,methods in the fields of Natural Language Processing(NLP)and machine learning(ML)are often used to making the process more automatic.This paper proposes and implements a new method for building Chinese domain ontology.Firstly,automatic crawler technology is used for Web news pages collection.Then domain terms,concepts and the taxonomic relationships are extracted out.Main work is as follows:1)A set of rules are created depend on Chinese lexical and syntactic features to extract the nouns and noun phrases as candidate domain terms.Then TF-IDF(term frequency–inverse document frequency)and DR&DC(domain consistent and domain relevance)algorithms are used to implement the extraction of terms separately.2)Extract the domain concepts from the domain terms by using logarithmic likelihood ratio and information entropy algorithm.This paper finds terms which are very similar to domain concepts by Word2 Vec algorithm,and expands domain concepts collection.The accuracy of the final results is improved by the definition information of online encyclopedia(Baidu encyclopedia and Wikipedia)in the connotation and extension of the concept.3)Extract taxonomic relationship by using rule-based and statistic-based methods.Firstly,part of taxonomic relationships are extracted out by using lexico-syntactic patterns and suffix matching algorithm.Secondly,more taxonomic relations are extracted out by using the similarity algorithms of the vector space,the Word2 Vec and the degree of the refinement.
Keywords/Search Tags:Domain ontology, Ontology learning, Term extraction, Concept extraction, Taxonomic relationship extraction
PDF Full Text Request
Related items