Font Size: a A A

Research On SVM Method Based On Semi-supervised Clustering Nucleus

Posted on:2013-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:T LiFull Text:PDF
GTID:2268330398998480Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Machine learning mainly researches how to through the computer to obtain new knowledge and skills, then to identify the existing knowledge. Learning a number of observations (samples) to establish a model or learner, in order to analyze and forecast unknown data. However, with the advent of the Internet and the rapid development of science and technology, the collection of a large number of unlabeled samples has become very easy, and the number of labeled samples is limited. How to use a small number of labeled data and a large number of unlabeled samples to improve the machine learning efficiency, has become the research hotspot in machine learning.The standard support vector machine (SVM) is usually supervised learning methods, mainly through the labeled samples and maximum interval principle of decision hyperplane, and doesn’t consider all samples instrinsic and geometric structure, thus to some extent confined the method’s ability on specific pattern recognition. In order to overcome this defect, the semi-supervised SVM based on cluster kernel is proposed.The semi-supervised SVM based on cluster kernel can make full use of the unlabeled samples in the process of constructing the kernel matrix, thus improving the classification accuracy of SVM. The main types of semi-supervise SVM based on cluster kernel include the random walk kernel and spectral clustering kernel. These two methods have to diagonalize the similarity matrix composed by the labeled and unlabeled samples, when there exist a large number of unlabeled samples, the space complexity of the storage of the similarity matrix and the time complexity of similarity matrix diagonalization is very high. How to use unlabeled sample as many as possible, enhanced the generalization efficiency of semi-supervised SVM, worth being research.The mainly work in this paper is summarized as follows:Elaborating the methods of semi-supervised SVM classification based on cluster kernel, analyzing the framework of cluster kernel, the advantages and disadvantages of the existing cluster kernel. Then proposed two semi-supervised SVM based on the cluster kernel.(1)The bagged clustering kernel semi-supervised classification algorithm of SVM. This method is mainly based on clustering assumptions:belonging to the same cluster samples may have the same category tag. The decision boundary should as far as possible through the sample that relatively in the sparse area, located in the same cluster in the sample can be classified in the same side. According to the cluster hypothesis, based on the small number of labeled data and all unlabeled samples run multiple k-means clustering algorithm, and get a semi-supervised kernel. The kernel can revision the similarity between the samples, to make the value of similarity higher when the samples locate in the same cluster, the value of similarity smaller when the samples locate in different cluster. The bagged cluster kernel is then used to construct the kernel function of SVM, and obtain the semi-supervised SVM based on bagged cluster kernel. Achieving to improve SVM classification performance by the use of unlabeled samples.(2)The hierarchical clustering connectivity kernel semi-supervised classification algorithm of SVM. In order to more effectively use the labeled data and the unlabeled sample, enhance the arbitrary distribution of data classification performance, proposed a semi-supervised SVM classification algorithm based on hierarchical clustering kernel, the algorithm uses hierarchical clustering method and combines the theory of connectivity kernel to construct the hierarchical clustering connectivity kernel. Making the data in the area within the same linear distribution or shape which is not regular become more dense. Then combine the kernel with SVM and get the semi-supervised SVM based on hierarchical clustering connectivity kernel. Experimental results show that, the method has obvious advantages to the SVM classification method and semi-supervised SVM based on bagged cluster kernel.
Keywords/Search Tags:Semi-supervised learning, Support vector machine classifying, Baggedcluster kernel, Connectivity kernel, Hierarchical cluster connectivity kernel
PDF Full Text Request
Related items