| The question answering (QA) systems have drawn attentions as one of theapproaches to access the information, which aim at directly satisfying userinformation need issued in the form of natural language. Though much progress hasbeen made, the QA systems mainly deal with simple factoid questions. However, thequestions involving multiple constraints and more facts are difficult to beunderstood correctly by current QA systems, which leads to poor answer extraction.Therefore, complex question understanding is one of challenges for the developmentof QA systems.This thesis deals with complex factoid questions, attempts to decomposite theoriginal question into several sub-questions, and then solves the original question byintegrating evidences for sub-questions. To the best of our knowledge, there is littlesystematic work on this topic, and the definitions and algorithms are still in the earlystage.This thesis focuses on the following aspects about dealing with questiondecomposition:1. We attempt to understand the characteristics of the sub-questions and therelationship between sub questions deeply by examining the linguistic phenomenonfrom real data. We construct a question decomposition corpus, clarify the annotationtaxonomy and method.2. To solve the problem of how to decomposite a question, we propose adependency syntactic rule based system to extract sub question candidates. We findthat the composition of sub questions is related to the syntactic structure of thequestion. A set of syntactic rules are summarized to extract sub questions to coverthe main facts and build the relationships between sub-questions.3. To sovle the problem of too many generated sub-question candidates, wepropose a fluency based method and a syntactic pattern based method to verify thesub question candidates. A web based Ngram estimation method is applied toalleviate the data sparseness problem for measuring question fluency and a set ofsyntactic patterns are extracted from manually annotated sub-questions.Sub-question candidates are ranked based on these resources so that high qualitysub-question candidates could be selected.In summary, we attempt to explore the question decomposition problem fromthe angles of concepts and linguistics. The preliminary observations and resultscould be applied to answer extraction for complex questions in future. |