Font Size: a A A

Research On Classification Method Based On SVDD

Posted on:2021-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:C YangFull Text:PDF
GTID:2428330626455334Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification is a very important task in the field of machine learning.However,in real-life classification tasks,the data of different types may have overlapping parts.Non-separable regions will appear in classification,and such samples are difficult to be correctly classified.Machine learning mainly involves the computer training the model with known data and then uses the model to predict the unknown data.Probabilistic machine learning provides a probabilistic framework for this uncertainty,representing and controlling the uncertainty of models and predictions.Therefore,the research on uncertain task is a very meaningful topic.In addition,some samples are easy to be sampled in real life,while others are difficult to be sampled due to the particularity of their fields.This leads to the situation that some classes in the target data set have many samples while others have few samples.That is the distribution of samples is unbalanced.However,traditional machine learning classification algorithms tend to favor most types of samples when solving such problems,which leads to some problems in classification.For example,in machine fault diagnosis,medical diagnosis and other issues,we need to pay attention to this kind of small but very important samples.If it is misclassified,it may cause very serious consequences.Therefore,it is important to improve the classification performance of a few classes in imbalanced data.In order to solve the above problems,this paper studies the classification method based on SVDD.The main contents include the following two aspects:(1)In view of the uncertainty existing in classification tasks,and the current probabilistic machine learning methods and traditional support vector data description methods face some problems in dealing with this problem,this paper proposes a support vector data description method based on probability.Firstly,the traditional support vector data description method is used to train the two types of data respectively to obtain the data descriptions.And the distance between the centers of the test samples is calculated.Then,a function that converts distance into probability is constructed,and a probability-based support vector data description method is proposed.At the same time,Bagging algorithm is used for ensemble,which further improves the performance of data description.Experiments show that the proposed algorithm has better accuracy and F1 value,and the performance of data description is improved.(2)In this paper,aiming at the imbalanced problem of two common types of data,starting from the algorithm level,a support vector data description method based on optimization is proposed.Firstly,this paper introduces several common optimization algorithms.And then a support vector data description method for understanding the problem of imbalanced data classification is introduced.At the same time,the number information and distribution information of samples are combined to redefine the C value.And several optimization algorithms are used for comparison.Finally,experiments are carried out on five datasets of UCI.The experimental results show that the proposed algorithm has certain advantages under the action of optimization algorithm,among which GA algorithm has a better overall effect.In a word,this paper studies the two problems existing in machine learning classification task by using the support vector data description method.And it is verified on the experimental data set.The research in this paper provides new ideas and methods for machine learning classification tasks.It has certain theoretical and application value in the field of machine learning.
Keywords/Search Tags:Support vector data description, Probabilistic machine learning, Imbalanced data, Ensemble, Classification
PDF Full Text Request
Related items