Font Size: a A A

Unbalanced Data Classification Under-sampling Algorithm Based On SVM For Research And Application

Posted on:2014-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:D X ZhangFull Text:PDF
GTID:2268330425966548Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Support Vector Machine (SVM) algorithm is treated as a basic method of statisticaltheory in machine learning methods. This algorithm is used to achieve well-roundedtheoretical derivation by using a solid theory foundation, which is also an effective tool indealing with a small number of data, local extremum and nonlinear problems. When it comesto the neural network algorithm mentioned before, there always exerts various issues in termsof speed, stability and wide-range utilization. However, compared with neural networkalgorithm, SVM obtained satisfactory results in those aspects. In practice, take fault detectionfor example, it is difficult to get information due to the few fault samples. Thus, the processof detecting fault data is always operated in the unbalanced condition. On the other hand, theSupport Vector Machine algorithm concentrates on the classification of data in balancedcondition. Therefore, in unbalanced condition, the treatment efficiency is not as theexpectation. As a result, recent years, an increasing number of scholars are concentrating onthe analysis in this field.In order to balance the statistic, Support Vector Machine algorithm is to achieve theimprovement in terms of data and algorithm. When it comes to data processing, oversamplingapproach is generally applied in dealing with small data samples while the subsampling iscommonly in analyzing more data. This paper proposes two approaches regarding to keepingdata balanced from unbalanced data, which are unbalanced data SVM algorithm based on thespectral clustering subsampling and unbalanced data SVM algorithm based on the reductionset subsampling. In this two approaches, preprocess will be adopted when addressing theissue of unbalanced data processing so as to achieve the equilibrium condition, even utilizedin SVM.When using the spectral clustering subsampling, firstly, it is common used in nuclearspace to deal with majority class data. Then, select those significant data points in eachcluster which are treated as information point of the majority class. Finally, organize the twocategories of data to get ideal classification boundaries. Meanwhile, to some extent, it isrelatively improved in classification performance in comparison with other algorithm.Besides, compared with traditional algorithm, the processing speed of SVM is relatively improved so that other capabilities are enhanced as well. Forthemore, this paper proposed aSVM algorithm which is based on the reduction data set subsampling in order to deal with theprocessing of boundary data in majority class data. The vector accompanied with informationfeatures which is obtained from majority class data stems from the support vector ofboundary position. By combining the two categories of data effectively, not only theclassification boundaries can be improved, but also the processing speed is enhanced.In the end of this paper, the spectral clustering subsampling is applied in fault detectionin terms of bearing rolling element fault, race fault and inner ring fault. Meanwhile, it is alsonecessary to analyze the parameters tested in the algorithm so as to put it into practice andachieve its application.
Keywords/Search Tags:Unbalanced data, SVM, Spectrum clustering, Reduced set, Fault detection
PDF Full Text Request
Related items