Font Size: a A A

Text Classification Research Based On Instance Transfer Learning

Posted on:2015-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:X M LiuFull Text:PDF
GTID:2268330428998010Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the Internet technology constantly tend to mature,all kinds of resources invarious forms appear in the network,While the information in resources began toshow explosive growth. How to find the information which people needs from thesemassive amounts of information becomes a natural problem which troubles the man.And the data mining as a solution to the answer to this question has received thepeople’s chase naturally, since there has become a research hot topic at the start.Among these massive information, there are quite a part is stored in the form of text,text classification as an important application of text mining may inevitably attract theattention eagerly.Study of text classification technology is mainly divided into two periods: periodbased on knowledge engineering method, and based on machine learning methods.The method based on knowledge engineering require the participation of domainexperts, it needs them to write the rules for classification task. Due to the inefficiencyand limitations of the method based on knowledge engineering, although thistechnology has obtained some fruits, but soon was abandoned by people, while due tothe method based on machine learning is to use computer to replace artificialautomatic classification which releases the human resources,efficiency is very highand has a strong ability to portable, so quickly won the favor of people.So far, the text classification research based on the machine learning method hasmade great success, has achieved many outstanding results. However, this technologyalso has its own defects. Because the traditional machine learning method is based onstatistics theory, and the requirements for training set which trains the classifier andtest set which tests classifier performance must obey the same distribution. Butsometimes, for a new field of text classification task, we may not be able to getenough training set samples, this is because of the expensive to collect the sample, orwe can’t collect any samples. People naturally think of using the knowledge whichlearned from other areas of the field or task to help improve classification task in thisfield. Due to its own characteristics, the traditional machine learning method couldnot do that. Transfer learning as a new research hotspot is put forward to solve thisproblem. It can apply the previously knowledge which studied from other field to anew field, as long as the two fields is similar to each other enough, which can obtain asatisfactory result.In our paper, We first describe some forming and mature theory of the textclassification, all aspects of the text classification is introduced: the text pretreatment,including text vector representation and the feature weighting, feature extraction, andat present more popular text classification algorithms, also summarizes the evaluateclassifier method and standard. Then we describe the basic theory and researchprogress of transfer learning. Finally, we designed a text classification algorithm to finish the text classification task, our algorithm uses the instance transfer methods.This algorithm is the extension of classical classification algorithms AdaBoost, it usesthe samples which is labeled from other related fields, through reweight them toextend the training set of the task field to generate a classifier which has a higheraccuracy. The core idea of this algorithm is that it assigns higher weight to these fieldswhich show positive transfer to the target field while assigns lower weight to thosefields which show negative transfer, at the same time,it adjusts the weights ofindividual samples. After experimental verification, we prove that our algorithm isgiven a collection of source area, some of these source fields are related with thetarget field, others are uncorrelated, and less target domain training samples, we canget a more credible classifier...
Keywords/Search Tags:text classification, transfer learning, instance transfer
PDF Full Text Request
Related items