Font Size: a A A

Research On The Application Of Transfer Learning On Text Classification

Posted on:2012-12-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:J N MengFull Text:PDF
GTID:1118330335954677Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Transfer learning is a new machine learning framework, it differs from traditional machine learning methods, such as supervised learning, unsupervised learning and semi-supervised learning. This method learns a compact and effective representation through labeled samples from a source domain and unlabeled samples or few of labeled samples from a target domain, and then applies the obtained feature representation methods to the target domain. Transfer learning does not make the identical distribution assumption as traditional machine learning. Therefore, transfer learning can be effectively share and transfer information between similar domains or tasks. At present, transfer learning gradually becomes the hot topic in information retrieval, text mining and natural language processing, at the same time causes the highly attentions of academic and corporate.Based on the text classification as a background and the transfer learning as a research content, around the main challenges of text classification, this dissertation focuses on the specific application of different transferring information and method, discusses the reconstructing methods of features and samples at the transfer learning mode, and proposes several text classification methods suitable for transfer learning. The main research results include:1. Transfer learning method based on feature mapping is proposed. Features and samples are two important aspects of text classification. It is very important that the two factors are considered comprehensively. The feature-based and sample-based methods are combined. Firstly, the common features subspace of the source domain and the target domain is constructed. Based on mutual information the most interrelated feature is got between a common feature and a subject factor in the target domain. Then, a new feature mapping function is learned on the new feature subspace. Finally, the data of the source domain and the target domain are re-weighted. Knowledge transferring is finished through the sample-based method; as a result the distances between the data coming from different distributions are reduced. In the experiment, three constructed text classification corpora fitting with transfer learning are used. Experiment results exceed several traditional classification methods, which verify the effectiveness of the proposed method.2. Adaptive transfer learning method is proposed. The similar degree between the specific features in the target domain and the common features is computed based on the singular value decomposition method. Then the data of the training and the test are reconstructed according to the computed similar degree values. The test data are predicated a label based on the new constructed model and the suitable target domain data are selected and added to the original training set adaptively that solve the data biased problem of original training data. The proposed method is applied to the ECML/PKDD 2006 Discovery Challenge corpus. The preferable experiment results are gained, the effectiveness of the proposed method is demonstrated.3. Graph-based transfer learning method is proposed. The graph-based method has excellent properties at the spectral graph theory. There are many graph-based methods. PageRank algorithm is known as an extensive method and is expanded to many areas. The proposed method takes the PageRank algorithm as a basic framework. A fusion graph model is constructed through the source domain data and the target domain data. The pseudo labels of the target domain data are obtained by the source domain data and updated by the target domain data. At the same time the last predicted labels are retained in the iterative computations. When the algorithm converges, the predicted labels of the target domain data are the final results. Theoretically, the convergence of the algorithm is proved and the simulated experiments are also given. The web text classification, sentiment classification and spam filtering corpora related to transfer learning are used. Compared with the supervised learning and semi-supervised learning, the experiment results show the significant improvements and demonstrate the effectiveness and universality of this method.
Keywords/Search Tags:Transfer Learning, Text Classification, Feature Selection, Adaptive, Graph Ranking
PDF Full Text Request
Related items