Font Size: a A A

Research FAQ Answers System Based On Chinese Phrase Chunks

Posted on:2014-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:C M KangFull Text:PDF
GTID:2268330401973290Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Question answering system is an important direction in the field of natural language processing, it aims at allowing users to ask questions in natural language and getting answers directly. Compared with the traditional keywords search engines, automatic question answering system has some significant advantages. In the restricted domain based on the FAQ (frequently asked question) question answering system, the user’s frequent ask questions and answers are organized together, which makes the positioning of the answer more accurate, simply and efficiency. Due to its important application prospect in all areas of daily life, question answering system has become one of the hottest research areas currently. In this paper, we use the natural language processing knowledge to explore such key technologies in the question answering system as question classification, question chunking and question similarity calculation in restricted domain, and on this basis to achieve FAQ question answers prototype system. The main innovative achievements are as follows:(1) Use of probability and statistics for question classification, the classifier training only relies on the frequency of the feature words in the question, but it does not take into account the semantic relationships between words of question. This paper puts forward a question classification algorithm which combines semantic similarity with sequence analysis of Hidden Markov Model. Firstly, it extracts the feature word set of all question categories as the observation sequence of different Hidden Markov Model classifiers. Secondly, use the formation and evolution process of feature words set in different types as a sequence of state transition. Finally, construct the Hidden Markov classification model for different question categories by calculating the feature word’s observation probability distribution in different states. The question classification experiment in the field of tourism is conducted. The results show that the proposed method has a great improvement than the existing methods, and this approach could effectively use the relationship between the words in question to classify the question.(2) Existing chunking analysis method mainly through the words literally information and statistical characteristics to achieve chunking, without considering the syntactic structure characteristics of different questions. To solve above problems, this paper proposes a new Chinese question chunking analysis method which bases itself on the phrase syntax tree in Chinese language. Firstly, the method combined with the ask way and lexical features of question, analysed question sentence pattern and summed up the morphology of different question on the basis of the known questions category. Secondly, used the phrase parser to generate the phrase syntax tree of questions. Finally, combined with the domain characteristics of the questions and customized the set of chunking rules to achieve the chunking identification and labeling. The experiment results show that the proposed method has a good effect on Chinese question chunking.(3) Aiming at existing calculation methods of Chinese sentence similarity do not make full use of the lexical semantic and sentence structure information, this paper puts forward a method of question similarity calculation which bases itself on improved edit distance algorithm. The method first used the chunking to substitute the character as basic edit unit, and according to the characteristics of the domain question, given different weights to different words. Then it measured the replacement cost between the chunking by words similarity in the chunking which based on hownet. Finally, for different types of chunking given different insert price and delete price. The experiment results show that the proposed method has good effects.(4) On the basis of above research results, taking the field of tourism in Yunnan for an example, this paper designed and implemented Yunnan Tourism FAQ Answers prototype system through classifying, chunking and labeling the field questions.
Keywords/Search Tags:Chinese question answering system, restricted domain, question classification, chunking, edit distance, question similarity
PDF Full Text Request
Related items