Font Size: a A A

Applications Of Short Text Similarity Assessment In User-interactive Question Answering

Posted on:2011-06-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:W P SongFull Text:PDF
GTID:1118360305966707Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the dramatic development of the Internet and the emergency of Web 2.0, Question Answering (QA) becomes a new Information Retreival (IR) technology. Unlike search engines which return a few relevant documents, QA systems give one or several exact answers for each user question, which is more preferable. However, traditional automatic QA systems suffer from poor answer quality problem because it is very difficult for machine to understand human's question well. To solve this problem, User-interactive QA systems have been developed and become a very popular Web-based service. Unlike the traditional automatic QA systems which totally obtain answers automatically, the user-interactive QA systems serve as interactive platforms for users to help each other with human-provided answers, which overcome the shortcoming of poor quality of the automatic answers.Short text similarity assessment is very important in user-interactive QA systems because questions and answers are usually short text. Question/answer processing depends on better understanding the semantics of questions and answers and measuring their similarity. The applications of short text similarity assessment in user-interactive QA systems mainly include frequently asked question (FAQ) answering, question categorization and answer clustering. In this dissertation, we focus on these three applications. The research contents and contributions are as follows:First, a novel question similarity calculation method based on semantic space for FAQ answering is proposed. At first, a semantic space is constructed based on accumulated questions. Then questions are mapped into the semantic space and represented by vectors and finally the question similarity is calculated based on these vectors. By the semantic mapping, questions representation is semantically enriched. We also use semantic feature clustering to eliminate the redundant information.Second, an automatic method of question categorization in user-interactive QA systems is proposed. In the method, some important words extracted from accumulated questions are selected as features to construct a feature space and represent each category as a vector in the feature space. For each user question, it is also mapped into the feature space and the similarity between the question vector and each category vector is calculated. The similarity scores are sorted in the descending order and the top k ranked categories are recommended to the user. The semantic patterns are also used to identify and weight the topic-wise words in each question. These words play more important roles than other words for question categorization. Finally, an answer clustering method is proposed, in which all the answers of the same question are clustered into some clusters according to their content or meaning. Moreover, a representative answer is selected for each cluster. In this way, users can get the information of each cluster quickly by only reading the representative answer. In the proposed method, there are two important parts:answer similarity calculation and clustering algorithm. For the answer similarity calculation, a combined method with statistic similarity and semantic similarity is adopted. For the clustering algorithm, an incremental algorithm is designed to reduce the time complexity.
Keywords/Search Tags:User-interactive Question Answer System, Short Text similarity, Automatic Question Answering, Question Categorization, Answer Clustering
PDF Full Text Request
Related items