Font Size: a A A

Research Of Text Classification Algorithm Based On Semi-supervised SVM Active Learning

Posted on:2014-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y XuFull Text:PDF
GTID:2268330401476358Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The continuous development of information technology transformed the world into a seaof information. People are gradually overwhelmed by large quantity of information and howto discern and classify the useful information from the original information is an importantissue. Text is the main carrier of information. The research of text classification algorithm hasobvious significance. Currently, active learning has been introduced to improve textclassification algorithm performance further. Support Vector Machine (SVM) active learninghas been used for text classification techniques widely. But classical SVM active learning hastwo main drawbacks: first, labeled samples are limited. Second, there are a lot ofredundancies among labeling samples.This thesis systematically studies the SVM active learning approach on the textclassification. On the studies of defects of the Support Vector Machine Active Learning, thisthesis proposes a new a semi-supervised Support Vector Machine Active Learning(SS-SVM-AL). The thesis is organized as follows:(1) First of all, the research background of the project is summarized. Second, the theoryand technology of text classification are concluded. Third, the research methods and the keytechnologies of the support vector machines and active learning theory are introduced. Finally,we studied the theoretical knowledge and classical methods of traditional SVM activelearning and semi-supervised learning.(2) We handle the small training size problem by a semi-supervised learning technique.It makes full use of the spatial structure information existed in unlabeled samples, overcomesdefects of the training set by the mixture of labeled samples and unlabeled sample set insteadof merely labeled samples. Learning a semi-supervised kernel function is crucial to strengthengeneralization ability of the model.(3) To reduce the redundancy among samples, we design a active learning approachbased on maximum-minimum framework. The main idea of which is to introduce the valuerepresenting probability selection of an unlabeled sample that can be solved by the learningsemi-supervised support vector machines. It will be used to identify the informative anddiverse examples. As a consequence, the classifier is greatly improved by avoiding theselection of similar samples.(4) On this basis, the SS-SVM-AL algorithm is proposed. The proposed algorithm hasbeen applied to the real-world data sets. And the result demonstrates that SS-SVM-AL algorithm is better than others’ performance.
Keywords/Search Tags:Text Classification, Active Learning, Semi-supervised learning, SupportVector Machine
PDF Full Text Request
Related items