Font Size: a A A

Semi-supervised Classification Method Based On Support Vector Machines

Posted on:2014-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:M Q LiaoFull Text:PDF
GTID:2268330425466406Subject:Navigation, guidance and control
Abstract/Summary:PDF Full Text Request
With the development of technology, people need to analyse billions of data, such asindustry information, DNA analysis, geography, and so on. However not all of the data areclearly labeled, when there is a huge number of unlabeled data, how to get usefulclassification information form them attracted the attention of researchers. Traditionalsupervised learning and unsupervised learning are not full use of labeled and unlabeled data,wasting a large number of valuable data resources. Semi-supervised machine learning can notonly effective use of labeled data but also the unlabeled data, to improve accuracy andenhance generalization ability, by guiding the training process. Research of this paper is basedon Transductive Support Vector Machines (TSVM).First, we introduce the Transductive Support Vector Machines (TSVM) and theprincipal principle of Semi-supervised learning. Next, do analysis and comparison of existingimproved TSVM methods. Many of these methods are based on change the optimizationfunction of TSVM, then find the minimum value of this optimization equation, but ignore therelationships between training samples, and just get limited improvement by optimizing theformula of semi-supervised learning algorithm. This paper focus on data itself, leading to realdata-driven method for data classification. Based on this consideration we presented twoimproved semi-supervised support vector machine learning classification algorithms:1)Similarity Label Propagation Semi-Supervised Support Vector Machines(SLPS3VM);2)Geodesic Label Propagation Semi-Supervised Support Vector Machines(GLPS3VM).Theoretical analysis and experimental results show that the main advantage of improvedsemi-supervised learning is reflected in:1) Two improved semi-supervised learningclassification algorithms are still valid with imbalance category ratio of unlabeled data.Because the category ratio of labeled data may not identical to the ratio of unlabeled data. Theimprovement of this article, using Label Propagation progressively marked positive andnegative samples to automatically find the right proportion.2) The proposed SLPS3VMmethod can classify cluster sample more efficient.3)When data have manifold structureGLPS3VM can get better classification accuracy than ordinary cluster based methods.
Keywords/Search Tags:semi-supervised learning, SVM, TSVM, cluster, manifold, geodesic
PDF Full Text Request
Related items