Font Size: a A A

Study Of Multiple Dimensional Frequent Sub-Tree Pattern Based Chinese Focus Words Recognition

Posted on:2016-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuangFull Text:PDF
GTID:2308330473960207Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Interrogatives and focus words always reveal useful information in Chinese questions, so it is important to recognize them while processing Chinese questions. Because interrogatives are few and relatively fixed, the accuracy of recognizing interrogatives is high. However, characters of Chinese language such as parataxis, no temporal change and word order flexibility together with Chinese focus words being various and complex on quantity, position and component make Chinese focus words recognition harder.Questions can be resolved into dependency relationship trees according to Chinese grammar, thus those general structures are reflected by those trees. So mining those common structures from a tree database can help recognize Chinese focus words. This thesis is dedicated to study mining Chinese question dependency trees, and their application in Chinese focus words recognition.The main works of this thesis are as follows:(1) The current research are all focused on the traditional trees whose nodes contain only one dimensional information, while in fact nodes in dependency relationship trees are consist of words which have multiple dimensional information. Concept of Multiple Dimensional Tree (MDT) is proposed and its properties are explored and discussed; algorithm of multiple dimensional frequent sub-tree mining is proposed as well as the pruning and candidate sub-tree generating strategy, designed experiments are repeated to verify its efficiency.(2) Multiple dimensional sub-tree patterns are adopted to recognize Chinese focus words by mining the hidden statistical relations from dependency relationship trees. This method is a kind of typical objective method based on statistics strictly mined from large-scale corpus, and so its annotation’s stability, adaptability and robustness are better. The empirical results show that the proposed method can averagely improve focus annotation accuracy rate compared to CRF model.(3) Against the great many rule patterns generated from frequent dependency sub-tree patterns, problem of redundancy has been discussed and rule reduction strategy is made to find out those redundant rules accurately and scale down the rule library in this thesis.
Keywords/Search Tags:Multiple Dimensional Tree, Dependency Relationship Tree, Condensing Pattern, Rule Conflict, Focus Words
PDF Full Text Request
Related items