Font Size: a A A

Research On Cross-domain Classincation For Short Texts

Posted on:2019-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:D Y LiFull Text:PDF
GTID:2428330548991214Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Short texts have been widely emerged in micro-blog,e-commerce and other domains and their data scale is daily on the increase.Moreover,text data from different domains have different data distributions.It makes traditional classification method no longer applicable.The cross-domain classification aims to help solve the classification task in the unlabeled target domain by transferring the knowledge of related domains.There are many successful cross-domain classification methods that do not perform well in short texts.Short texts present short length,sparsity and nonstandard and they contain less effective information.Among them,the sparsity causes the bad performance of text classifier;the multi-word synonyms make the polarity of features and the co-occurrence of features weak.All of these add the difficulty of transferring knowledge.In this dissertation,we focus on the research of cross-domain classification for short text,our main work of this paper is as follow.(1)According to the sparsity of short text,a novel cross-domain classification algorithm for short text based on feature extension is proposed.which is based on the spectral graph theory and the co-occurrence of features.It first uses two layers of spectral clustering to implement the similar feature extension for the shared and specific features of two domains,which aims to reduce the sparse features and the different data distributions of two domains.And the classifier is trained on the extended dataset to improve the cross-domain classification result on short text.(2)In order to solve the sparsity and multi-word synonyms problem of short text,a new cross-domain classification algorithm for short text based on topic correlation analysis is proposed.It use biterm topic model called BTM for short text to extract topic on shared and specific features.The rich information of the whole corpus which strengthens topic learning can overcome the above problem.On this basis,it first analyses domain specific topic correlation,then maps the dataset to new public feature space to reduce the distance between the domains for the cross-domain classification.
Keywords/Search Tags:Short text classification, transfer learning, spectral clustering, topic models
PDF Full Text Request
Related items