Font Size: a A A

Research On Information Distance Theory And Its Application In Question Answering System

Posted on:2009-07-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:1118360272491738Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
One of the most important task in Question answering system is to calculate the similarity/relationship between text segments, such as words or sentences. Besides, similarity/relationship calculation is also important in many domains including Information Extraction, Information Retrieval, Knowledge Representation and Knowledge Reasoning. Theoretically, the similarity/relationship calculation can be treated as a unified problem of measuring the distance between two entities, under some distance measure. This thesis focuses on establishing and completing the unified distance theory——Information Distance theory, to solve this problem. Distance measures betweentext segments under different aspects are discussed using Information Distance and Conditional Information Distance. Based on the theory, a natural language question answering prototype system QUANTA is designed and implemented.·Extending the traditional work on max information distance theory, a Kol-mogorov Complexity based min information distance is proposed. This new measure solves several problems that traditional information distance metric encounters in application, including partial matching problem, triangle inequality problem and density problem. The weak universality of the max normalized information distance is proved, and a conclusion is given for the cases that universality doesn't hold. For the min normalized information distance, the proof of the strict universality is given. Besides, the conditional information distance theory is fully discussed and developed on its physical and mathematical characteristics.·Based on the information distance theory, the similarity between words and sentences are discussed. Word semantic similarity can be measured by pattern-based conditional information distance from variant aspects. Conditional Kolmogorov complexity estimation method based on maximum overlap rule and min distance theory is proposed to calculate the sentence semantic similarity, and this is suc- cessfully applied in passage retrieval task in question answering system.·Answer validation is one of the most important stages in question answering system. In this thesis, this problem is solved by calculating the conditional information distance between the question focus and the answer over certain condition patterns. Extensive experiments are conducted to justify the method.·An open-domain factoid question answering prototype system——QUANTAis designed and implemented combining Natural Language Processing, Text Categorization, Information Retrieval, and Information Extraction technologies. The system answers natural language questions through Question Preprocessing, Query Formulation, Doc/Passage Retrieval, Candidate Generation and Answer Validation stages.
Keywords/Search Tags:Kolmogorov complexity, distance metric, information distance, question answering, passage retrieval
PDF Full Text Request
Related items