Font Size: a A A

Semi-supervised Text Classification Algorithms Based On Transfer Learning

Posted on:2016-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:J Y HanFull Text:PDF
GTID:2298330467495837Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the network has become the carrier ofmass information. Text is the main form among the massive amounts of information.Effectively and efficiently processing of data has become much more necessary becausetext data contain rich and ample knowledge. Among them, the text classification is oneof the important forms of data analysis, it can extract and depict the important datamodel so that the computer can acquire knowledge from the past data, thus the “learnedexperience” are used to solve the current problem. Text classification has been widelyused in many areas such as information organization and management.To achieve better performance, traditional text classification methods have veryhigh demand on training data. For example, the training datasets should have sufficientlabeled data and less noise. In addition, the training data and the testing data shouldhave the same probability distributions. But in many cases, these conditions are notsatisfied. When learning a new target domain, if labeled data information are not enough,they will directly influence the learning effect. In this case, we notice that those outdateddata or trained data has a huge potential value, but the probability distribution ofoutdated data is different from that of the target domain, we cannot directly use them.To solve the problem, the transfer learning is adopted.Transfer learning is a new learning framework, the idea of the framework is that“To infer other things from one fact”,“Comprehend by analogy”. The framework hasless requirement for the training data and test data, and the “transfer” appears betweenthe same or different domains. For example, we want to use the outdated data, we can use “transfer learning” to choose the valuable data of them, and put them into targetdomain learning. In recent years, the idea of transfer learning gradually goes intopeople’s vision, and the researchers in the field of Text Mining, Natural LanguageProcessing and Information Retrieval also pay more and more attention to transferlearning.This paper mainly focuses on two-classification problem on the target domainwhen the labeled data in target domain is too small. PU learning, as one method to solvetwo-classification problem, is a kind of semi-supervised text classification algorithm.The two step learning process of traditional PU includes:(1) extracting the reliablenegative instances and (2) using positive instances and reliable negative instances totrain the classifier.This paper is mainly used to solve the problem based on traditional PU learning,in the two steps. TransferPU learning algorithm is proposed. TransferPU learningtransfers the knowledge from two aspects. First of all, we consider the feature ofknowledge. We proposed two concepts, called “strong feature” and “weak feature”. Wetransfer the available features of the outdated dataset into target domain feature set tomake the target domain feature set more complete and refine the description of instances.Next, we further consider the instance of knowledge. The new feature set is used toselect the reliable negative instances. Moreover, we proposed the concept of “candidatepositive/negative instances”. Strong feature set and weak feature set are used to filterthe unlabeled instance set in algorithm T1DNF. We use algorithm to choose part ofpositive and negative instances to extend the available instance. Furthermore, we useimproved classification algorithm (called TransferISVM) to train the target data.This paper conducts comprehensive experiments for the proposed algorithm. Wetransfer the knowledge in domains which are similar to the target domain. We fullycompare our algorithm with two existing non transfer learning algorithms in differentevaluation metrics. The result of the experiment shows that the TransferPU can extractpositive features exactly. It can get sufficient reliable negative instances. Theperformance of classification can be enhanced when the target domain has fewer positive instances.
Keywords/Search Tags:Transfer learning, Instance transfer, Feature selection, PU learning
PDF Full Text Request
Related items