Font Size: a A A

Research On Text Classification Algorithms Based On Semi-supervised Learning

Posted on:2015-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:F H DuFull Text:PDF
GTID:2298330452953355Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text classification is one of the hotspots in the field of text mining. There are alarge number of unlabeled text data in the real world, and it takes huge manpower andfinancial resources to get labeled text data, so it is very significant to studysemi-supervised text classification that make use of unlabeled text data. With theentry of big data era, the semi-supervised text classification is more efficient thanother methods in the organization and management of massive disorder Internet textdata. It also attracts more and more attention and research of scholars both at homeand abroad. This paper research on the problem in semi-supervised text classificationbased on ant colony algorithm and transfer learning, which inconsistent datadistribution may lead to performance degradation, the main work includes:(1) This paper presents a semi-supervised text classification algorithm based onaggregation pheromone, which was used for species aggregation in real ants and otherinsects. The proposed method, which has no assumption regarding the datadistribution, can be applied to any kind of data distribution. Firstly, in light ofaggregation pheromone, colonies that unlabeled ants may belong to are selected witha top-k strategy. Then the confidence of unlabeled ants is determined by a judgmentrule. Finally, unlabeled ants with higher confidence are added into the most attractivetraining colony by a random selection strategy. Compared with na ve Bayes and EMalgorithm, the experiments on benchmark dataset show that this algorithm performsbetter on precision, recall and Macro F1.(2) We have proposed a semi-supervised text classification algorithm based onfeature mapping. First, we select three respective sets of features from labeled data,unlabeled data and test data using different feature selection methods, and initializetheir value; Second, three feature mapping functions are learned, then the weight ofeach feature is recalculated by them; Finally, the EM algorithm classifies text data.Experiments on standard data sets show that the proposed algorithm is effective.
Keywords/Search Tags:ant colony algorithm, aggregation pheromone, feature mapping, semi-supervised learning, text classification
PDF Full Text Request
Related items