Domain Ontology Extraction On Unstructured Text

Posted on:2020-12-25

Degree:Master

Type:Thesis

Country:China

Candidate:F C Liu

Full Text:PDF

GTID:2428330602458000

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data and artificial intelligence,data has become the primary concern of various industries.However,most of the traditional domain ontology construction technologies are based on structured data or semi-structured data,which firstly ignores the important information that may be contained in unstructured data.Secondly,the key to extract ontology from Chinese text is the extraction of terms,while the traditional word vector construction algorithms,such as:TF-IDF,Word2vec,need to repeatedly traverse the corpus,which is time-consuming and impurity,and does not consider the full text recurrence and co-occurence of terms,resulting in low precision and low recall rate.Thirdly,the transition of Chinese terms from unstructured to structured requires conceptual verification and structured defination.Finally,Ontology,as a formal representation of shared conceptual model,should have strongly active-learning capability.To solve the problems above,this paper firstly adopts CKIP conceptual structure tree and omits manual annotation.Secondly,Wikipedia Extractor is used to extract multi-domain text data from Wikipedia,and Wikipedia definition data is utilized to validate and revise subsequent offset.Thirdly,after putting forward the principle of establishing Chinese corpus,the concept structure tree of terms is constructed by CKIP system,and the lexical and syntactic analysis in text preprocessing is also carried out by CKIP.Three parameters WPOS,WTv and WTG based on language morphology and conceptual structure are developed to propose an unsupervised self-organizing term extraction algorithm(SOM).Fouthly,this paper verifies the conceptual characteristics of terms from the aspects of connotation,extension and synonym recognition,which further simplifies the conceptual structure tree,and completes semantic disambiguation and removes redundancy at the same time.In the fifth,this paper uses the combination of rule matching and suffix matching,the fine-grained algorithm based on similarity to extract hyponymy from the context.This paper gives the formal definition of Chinese domain ontology:D={C,A,R,O,X},and proposes an ontology extraction algorithm based on event triples(A-R-O).Finally,to improve domain ontology on self-learning ability,a parallel fuzzy reasoning mechanism is proposed based on conceptual resonance yCRSThe experimental results reveal that under same training sets,the proposed method can not only can continuously carry out the structured processing on unstructured text,but also has higher accuracy than the traditional hyponymy extraction algorithm.

Keywords/Search Tags:

Domain Ontology, Concept Clustering, Hyponymy, Fuzzy Reasoning

PDF Full Text Request

Related items

1	Extraction And Organizational Domain Ontology Hyponymy Research
2	Research Of Domain Ontology Automatic Construction Method Based On Relational Database
3	Research On Domain Ontology Model And Semantic Reasoning Of Animation Material
4	Research On Key Techniques Of Target Recognition Based On Concept Reasoning
5	Research On Domain Ontology Representation, Reasoning And Integration For The Semantic Web And The Applications
6	Research On Semantic Web Fuzzy Ontology Construction Based On FFCA
7	Research On Ontology Learning Method In Patent Field
8	The Research Of Fuzzy Domain Ontology Construction Based On Mining Online Reviews And Its Application
9	Research On Knowledgechains-based Ontology Construction Methodology
10	Reserch And Implementation On Semi-Automatic Domain Ontology Acquisition Method