Font Size: a A A

Classification Of Semi-supervised Imbalanced Data

Posted on:2017-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:J F LongFull Text:PDF
GTID:2428330590488949Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the continuous development of the computer technology and some other related fields,the capacity of data storage and analysis has gained a huge promotion.Data mining has been widely used in various industries.Attendant,data mining is also facing more challenges.The different distribution of data sets under different scenarios and an algorithm is not always applicable in all sorts of dataset.This requires pertinent theory t according to the type of the data set.For the semi-supervised unbalanced dataset,as its semi-supervised,imbalance and other characteristics,the traditional algorithms is difficult to perform effectively on this dataset.This requires to modify the traditional algorithms according to the characteristics of the data set to apply to such problems.In this paper,aiming at semi-supervised problem and imbalance,we did a lot of research.On this basis,according to the characteristics of the data set,we proposed a solution on the classification problems of semi-supervised imbalanced data.Here is the main work:Firstly,we introduce the development and basic theory of the semisupervised learning techniques,unbalanced dataset learning techniques.Secondly,we use LDA classifier on unlabeled dataset in the semisupervised data set to eliminate the imbalance between labeled dataset classes.On the balanced data set,we use the BP-Adaboost classification algorithm to classify the data set.Finally,we the compare clarify effect of proposed method on the UCI datasets.As showed in the tests,the method we proposed achieved a significant improvement compared to traditional methods.
Keywords/Search Tags:semi-supervised data, unbalanced data, LDA, BP-Adaboost
PDF Full Text Request
Related items