Font Size: a A A

Research On Some Problems And Applications In Support Vector Data Description

Posted on:2011-07-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:F M GuFull Text:PDF
GTID:1118360332457113Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Statistical Learning Theory aims to investigate characteristics of learning problems with finite samples and provides complete and consistent theoretical framework. Built on Statistical Learning Theory,Support Vector Machine (SVM) is a classical learning method which uses Structural Risk Minimization principle, is capable of combining with lots of other machine learning technology and shows many better performances. Support Vector Data Description (SVDD) is a completely new method based on Statistical Learning Theory and SVM. Different from SVM's looking for hyperplane, it pursues to find a hyperspere enclosing target data. SVDD is a classical one-class classifier or data description method and has widespread application in the field of fault detect, industrial and medical diagnosis, network security, target class identification, intrusion detect, face recognition and so on. SVDD becomes hot spot of machine learning in the recent years.However, SVDD is still immature in many aspects and needs further research for it is a quite new theory in machine learning. Among these researches, SVDD's learning algorithm is a key point and difficult part. In this thesis, we aim to improve SVDD's learning capability under the setting of unsupervised learning and semi-supervised learning. We explores SVDD's some problems circling the aspects on improving learning accuracy, studying new learing algorithm, data preprocessing, application extension and so on. The following is the detail:(1) To solve inaccurate classification problem of conventional SVDD in unsupervised settings, AIKCSVDD, a support vector data description method based on artificial immune kernel clustering is proposed. It uses memory antibodies generated by artificial immune kernel clustering algorithm as target data, and then uses SVDD to execute multi-class classification. On one hand, immune kernel clustering method does not need prior knowledge and can recognize data of no clear boundaries better; on the other hand, using memory antibodies as target data can reflect original data's global distribution better and need not know previously cluster number.(2) To enhance classification precision of traditional Support Vector Data Description with less classification information, the method of Semi-Supervised Weighted Support Vector Data Description for data classification is proposed, which uses a graph-based semi-supervised learning technology to learn the potential classification information of large number of unlabeled data with small amount of labeled data, then adopts a method of weighted Support Vector Data Description to learn a classifier for the whole data. Experiments on UCI datasets show that our method is efficient in the context of tiny known classification information.(3) K-Nearest Neighbor (kNN) classification is one of the most popular machine learning techniques, but it often fails to work well due to less known information or inappropriate choice of distance metric or the presence of a lot of unrelated features. To handle those issues, we introduce a semi-supervised distance metric learning method for kNN classification. This method uses a semi-supervised Label Propagation algorithm to gain more label information with tiny initial classification information, then resorts to an improved weighted RCA to learn a Mahalanobis distance function, and finally uses learned Mahalanobis distance metric to replace the original Euclidean distance of kNN classifier. Experiments on UCI datasets show the effectiveness of our method.(4) In real application, such as fault diagnosis, data often has very high dimension and non-uniform distributions. We propose a new method combining kernel distance metric LLE and SVDD solving these problems. In order to mine low-dimension meaningful information hiding in high-dimension data and extract better classification features, we uses LLE to dimensionality reduction in data preprocessing. For LLE needs dense sampling and has unsatisfactory results with Eucilian distance in high-dimension sparse space, we use distance metric in kernel space to improve LLE and get better original data manifold in low-dimension. Then we utilize SVDD method to process the new dataset. Experiments results show the proposed method has better performance for data of high-dimension and non-uniform distributions.On the whole, this thesis does researches on some problems and applications in Support Vector Data Description method. These researches have certain theoretical and practical significance in improving SVDD's learning capability. In future works, in addition to improve our current woks, we hope to make deeper research on SVDD and apply them to real applications.
Keywords/Search Tags:Support Vector Data Description, Artificial Immune Network, Semi-superised Learning, Kernel Clustering, Locally Linear Embedding, Fault Diagnosis
PDF Full Text Request
Related items