Interrogatives and focus words always reveal useful information in Chinese questions, so it is important to recognize them while processing Chinese questions. Because interrogatives are few and relatively fixed, the accuracy of recognizing interrogatives is high. However, characters of Chinese language such as parataxis, no temporal change and word order flexibility together with Chinese focus words being various and complex on quantity, position and component make Chinese focus words recognition harder.Questions can be resolved into dependency relationship trees according to Chinese grammar, thus those general structures are reflected by those trees. So mining those common structures from a tree database can help recognize Chinese focus words. This thesis is dedicated to study mining Chinese question dependency trees, and their application in Chinese focus words recognition.The main works of this thesis are as follows:(1) The current research are all focused on the traditional trees whose nodes contain only one dimensional information, while in fact nodes in dependency relationship trees are consist of words which have multiple dimensional information. Concept of Multiple Dimensional Tree (MDT) is proposed and its properties are explored and discussed; algorithm of multiple dimensional frequent sub-tree mining is proposed as well as the pruning and candidate sub-tree generating strategy, designed experiments are repeated to verify its efficiency.(2) Multiple dimensional sub-tree patterns are adopted to recognize Chinese focus words by mining the hidden statistical relations from dependency relationship trees. This method is a kind of typical objective method based on statistics strictly mined from large-scale corpus, and so its annotation’s stability, adaptability and robustness are better. The empirical results show that the proposed method can averagely improve focus annotation accuracy rate compared to CRF model.(3) Against the great many rule patterns generated from frequent dependency sub-tree patterns, problem of redundancy has been discussed and rule reduction strategy is made to find out those redundant rules accurately and scale down the rule library in this thesis. |