Font Size: a A A

Question Similarity Computation Based On Deep Learning And Topic Model

Posted on:2017-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhouFull Text:PDF
GTID:2308330503458947Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of the Internet, data in the Internet is growing explosively. Traditional search engine could hardly fulfill users’ various demands. Then, automatic question answering system comes into being, and becomes a new more effective way for information retrieval. Question similarity computation, as a key technology, has received widespread attention.Under the background of "big data", we study the problem of how to compute question similarity efficiently. According to features of question, we use vector to represent sentence and then compute distance between vectors to measure their similarity. The main work and innovation points of this paper include:(1) We analyze the drawbacks of existing methods, study on the neural network language model and topic model, and analyze their advantages on semantic representation;(2) To better represent the semantic of sentence, we propose two topic sentence vector models based on deep learning and topic model, and present the training algorithm. The second model improves the first one by removing the “bag-of-words” assumption. These two models combine the word co-occurrence of local context provided by sentence vector model with the global word co-occurrence provided by topic model.(3) To verify the effectiveness of our models, we conduct an experiment on sentence classification with IMDB movie review data, and compare our models with existing advanced models. The experiment result shows that sentence vector models with topic information beats the traditional sentence vector models, thus can represent the semantic of sentence better;(4) Design and implement the method of using sentence vector to compute question similarity. On large number of real-world Yahoo! Answers data, we use the proposed two topic sentence vector models, to conduct comparing experiment by some labeled data. The experiment result shows that our method can compute question similarity effectively on large-scale question-answering data.
Keywords/Search Tags:Question Similarity, Deep Learning, Topic Model, Sentence Embedding
PDF Full Text Request
Related items