Font Size: a A A

Domain Ontology Extraction On Unstructured Text

Posted on:2020-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:F C LiuFull Text:PDF
GTID:2428330602458000Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data and artificial intelligence,data has become the primary concern of various industries.However,most of the traditional domain ontology construction technologies are based on structured data or semi-structured data,which firstly ignores the important information that may be contained in unstructured data.Secondly,the key to extract ontology from Chinese text is the extraction of terms,while the traditional word vector construction algorithms,such as:TF-IDF,Word2vec,need to repeatedly traverse the corpus,which is time-consuming and impurity,and does not consider the full text recurrence and co-occurence of terms,resulting in low precision and low recall rate.Thirdly,the transition of Chinese terms from unstructured to structured requires conceptual verification and structured defination.Finally,Ontology,as a formal representation of shared conceptual model,should have strongly active-learning capability.To solve the problems above,this paper firstly adopts CKIP conceptual structure tree and omits manual annotation.Secondly,Wikipedia Extractor is used to extract multi-domain text data from Wikipedia,and Wikipedia definition data is utilized to validate and revise subsequent offset.Thirdly,after putting forward the principle of establishing Chinese corpus,the concept structure tree of terms is constructed by CKIP system,and the lexical and syntactic analysis in text preprocessing is also carried out by CKIP.Three parameters WPOS,WTv and WTG based on language morphology and conceptual structure are developed to propose an unsupervised self-organizing term extraction algorithm(SOM).Fouthly,this paper verifies the conceptual characteristics of terms from the aspects of connotation,extension and synonym recognition,which further simplifies the conceptual structure tree,and completes semantic disambiguation and removes redundancy at the same time.In the fifth,this paper uses the combination of rule matching and suffix matching,the fine-grained algorithm based on similarity to extract hyponymy from the context.This paper gives the formal definition of Chinese domain ontology:D={C,A,R,O,X},and proposes an ontology extraction algorithm based on event triples(A-R-O).Finally,to improve domain ontology on self-learning ability,a parallel fuzzy reasoning mechanism is proposed based on conceptual resonance yCRSThe experimental results reveal that under same training sets,the proposed method can not only can continuously carry out the structured processing on unstructured text,but also has higher accuracy than the traditional hyponymy extraction algorithm.
Keywords/Search Tags:Domain Ontology, Concept Clustering, Hyponymy, Fuzzy Reasoning
PDF Full Text Request
Related items