Font Size: a A A

Application Research Of Frequent Dependency Subtree Patterns In Question Classification

Posted on:2015-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y YuFull Text:PDF
GTID:2308330473959334Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Question classification is a key step in automatic question answering system, which plays a vital role for the computer to accurately understand the true meaning of natural language question.But, as the Chinese language Parataxis, non-tenses, word order flexibility, and the average distance-dependent characteristics of the word, Chinese question classification is difficult.Questions can be resolved into a dependency tree according to Chinese dependency grammar. Such questions common structural characteristics of natural language, can reflect the sub-structure characteristics of appears frequently in the dependency trees.In other words, Chinese problem Characteristics substructure mining form questions of dependency trees can help to problem classification. This thesis is dedicated to research mining problems of Chinese question dependency trees, and to applicate the results for question classification.The main works of this thesis are as follows:(1) In this thesis, a focus recognition method based on frequent dependency tree pattern of Chinese question has been proposed. In this method, relationships’ probabilities of various different dimensional features of focus hidden in the dependency tree corpus have been mined and have been used as basis for improving focus recognition accuracy. Even though Conditional Random Field(CRF) model can automatically tag focus in question based on focus’s statistical characteristics extracted from question corpus, some deep statistical relationships among focuses features still couldn’t be mined by CRF steadily, which result in nontrivial impairing on focus recognition. The main steps of the method include mining frequent subtree dependence model to generate the corresponding statistical rules, using CRF for initial focus annotation and, using frequency dependent subtree statistical rules to correct initial annotation etc. The method is a kind of typical objective method based on statistics strictly mined from large-scale corpus, and so its annotation’s stability, adaptability and robustness are better. The empirical results show that the proposed method can averagely improve focus annotation accuracy rate by about 3% based on CRF model.(2) Because too many statistical rule patterns would be generated from frequent dependency subtree patterns, definitions of redundant dependency subtree rule pattrens have been discussed in this thesis. And redundancy pattern reduction methods used when generating patterns and applying them have beend proposed too. By removing simple redundancy patterns, stringent redundancy patterns, low confidence patterns and other redundant dependency tree patterns, the number of patterns is significantly reduced while accuracy of focus tagging remains stable.(3) Category frequency dependency subtree pattern classification rules (CFDSP) generation algorithm has been proposed for mining various frequent struct features patterns of question with different category label. And these patterns mined, along with question words vs categories library and question words and focus vs categories library, and Bayesian classification have been intgrated together to form a new question classification method. Empirical results show that question classification method outperforms the existing ones and its accuracy rate increases significantly.
Keywords/Search Tags:frequent subtree pattern, focus, question classification, rule-based optimization, CRF
PDF Full Text Request
Related items