Font Size: a A A

A Short Texts Matching Methodusing Multi-level Features

Posted on:2015-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:L B KangFull Text:PDF
GTID:2308330479489746Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Relevance is one of the most important factors for many natural language processing tasks and applications. As the popular of various social media, especially micro-blogs such as Twitter and Weibo, short texts are prevalent on the Web. Facing the vast amount of short texts, a matching method for short text s has become an important task to mine semantically similar information. Due to the short length and a variety of expressions, traditional text processing methods are not well suitable for short texts.In this thesis, we will focus on a short text matchin g task called short text conversation. For a given short text such as a post, this task aims to find a massive suitable response from the candidate set. For this task, design a retrieval-based ranking model, which uses three kinds of matching features generated from different matching levels. We called these three kinds of matching features shallow features, deep features and rule-based features.For shallow features, we learned three linear matching models such as vector-space model, BM25 model and Latent s emantic indexing model. These models can measure post-response similarity by a word-by-word matching and capture the semantic matching between a post and a response. We design two matching models based on word embedding to generate deep features. The deep features cover rich semantic relevance information between post and response, which the shallow features cannot capture. In addition, we also use some handcraft features for this task, which can describe the relevance be-tween post and response for some special cases. Finally, we learn a ranking model based on Ranking SVM to ranking all the matching features.In order to verify the effect of matching features, we conduct experiments on a dataset of short-text conversation based on the real-world instances from Sina Weibo. Experiments show that when combined with deep features, the performance significantly outperforms the model just using shallow features. Combining all the matching features, we get the state-of-the-art performance.
Keywords/Search Tags:short text matching, semantic similarity, short text conversation, information retrieval, learning to rank
PDF Full Text Request
Related items