Study Of Multiple Dimensional Frequent Sub-Tree Pattern Based Chinese Focus Words Recognition

Posted on:2016-02-05

Degree:Master

Type:Thesis

Country:China

Candidate:Y Huang

Full Text:PDF

GTID:2308330473960207

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Interrogatives and focus words always reveal useful information in Chinese questions, so it is important to recognize them while processing Chinese questions. Because interrogatives are few and relatively fixed, the accuracy of recognizing interrogatives is high. However, characters of Chinese language such as parataxis, no temporal change and word order flexibility together with Chinese focus words being various and complex on quantity, position and component make Chinese focus words recognition harder.Questions can be resolved into dependency relationship trees according to Chinese grammar, thus those general structures are reflected by those trees. So mining those common structures from a tree database can help recognize Chinese focus words. This thesis is dedicated to study mining Chinese question dependency trees, and their application in Chinese focus words recognition.The main works of this thesis are as follows:(1) The current research are all focused on the traditional trees whose nodes contain only one dimensional information, while in fact nodes in dependency relationship trees are consist of words which have multiple dimensional information. Concept of Multiple Dimensional Tree (MDT) is proposed and its properties are explored and discussed; algorithm of multiple dimensional frequent sub-tree mining is proposed as well as the pruning and candidate sub-tree generating strategy, designed experiments are repeated to verify its efficiency.(2) Multiple dimensional sub-tree patterns are adopted to recognize Chinese focus words by mining the hidden statistical relations from dependency relationship trees. This method is a kind of typical objective method based on statistics strictly mined from large-scale corpus, and so its annotationâ€™s stability, adaptability and robustness are better. The empirical results show that the proposed method can averagely improve focus annotation accuracy rate compared to CRF model.(3) Against the great many rule patterns generated from frequent dependency sub-tree patterns, problem of redundancy has been discussed and rule reduction strategy is made to find out those redundant rules accurately and scale down the rule library in this thesis.

Keywords/Search Tags:

Multiple Dimensional Tree, Dependency Relationship Tree, Condensing Pattern, Rule Conflict, Focus Words

PDF Full Text Request

Related items

1	Research Frequent Pattern Mining Algorithm Based On Compact Pattern Tree And Multiple Minimum Support
2	Automatic Recognition Of Relation Words In Chinese Complex Sentence Based On Decision Tree
3	Study On Association Rules Mining Algorithm Based On FP-tree
4	Research On The Rule Excavation Method Based On Decision Tree In Automatic Identification Of Relation Words In Chinese Compound Sentences
5	Research On Mining Algorithms Of Maximal Frequent Item Sets
6	Research On Vietnamese Sentence Analysis And Tree Library Transformation Method
7	Research On The Sequential Pattern Mining Algorithms Using Prefix-tree Structure
8	Research And Application Of Classification Based On Classification Frenquent Pattern Tree
9	Research On Mining Algorithm Of Association Rules Based On Frequent Pattern Tree
10	Fp-tree-based Association Rule Mining Algorithm Design And Implementation