Font Size: a A A

Transferring Topical Knowledge From Auxiliary Long Texts For Short Text Clustering

Posted on:2013-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:O JinFull Text:PDF
GTID:2218330362459276Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of social Web applications such as Twitterand online advertisements, the task of understanding short texts isbecoming more and more important. Most traditional text miningtechniques are designed to handle long text documents. For short textmessages, many of the existing techniques are not e?ective due to thesparseness of text representations. To understand short messages, weobserve that it is often possible to ?nd topically related long texts,which can be utilized as the auxiliary data when mining the targetshort texts data.In this article, we present a novel approach to cluster short tex-t messages via transfer learning from auxiliary long text data. Weshow that while some previous works for enhancing short text clus-tering with related long texts exist, most of them ignore the semanticand topical inconsistencies between the target and auxiliary data andmay hurt the clustering performance on the short texts. To accom-modate the possible inconsistencies between source and target data,we propose a novel topic model - Dual Latent Dirichlet Allocation(DLDA) model, which jointly learns two sets of topics on short andlong texts and couples the topic parameters to cope with the potential inconsistencies between data sets.We demonstrate through large-scale clustering experiments onboth advertisements and Twitter data that we can obtain superior per-formance over several state-of-art techniques for clustering short textdocuments.
Keywords/Search Tags:Short Text, Statistical Generative Models, Un-supervised Learning
PDF Full Text Request
Related items