Font Size: a A A

Research On Classification Method Of Imbalanced Data Stream

Posted on:2017-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:W Z LiuFull Text:PDF
GTID:2348330488472012Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,tens of thousands or even more data are generated every day.These data are rapidly and continuously to arrive in the form of stream,such as the data from various types of monitoring systems,network intrusion,credit card fraud detection.How to find out useful information from large-scale,dynamic changing and real-time data stream has become a hot research in the field of data mining.Data stream classification is a new challenge in the field of data mining.The traditional data classification algorithms are not suitable for it,which needs to propose a novel algorithm to deal with dynamic data stream.Most of the research on data stream classification assumes that the data is relatively balanced and stable.However,it is not the truth in fact.According to the characteristics of skew data stream and concept drift,this paper proposes two kinds of data stream classification models.The specific contents are as follows:(1)According to the characteristics of large-scale and real-time data stream,an ensemble classification model of imbalanced data stream based on neural network is proposed.The proposed ensemble model consists of three parts: balancing the training data stream,constructing ensemble classification model,and incrementally updating classification model using the newly arrived data stream.In the proposed model,the improved under-sampling method is used to balance the data stream,and the neural network is used as the base classifier.We select three baseline methods and make comparisons on overall performance for ten data sets from UCI machine learning repository.The experimental results show the proposed algorithm can effectively deal with classification problems on data stream with non-stationary and class imbalance.(2)According to the characteristics of dynamic changes and non-stationary data stream,this paper presents an imbalanced data stream classification model based on dual weighted online extreme learning machine.The online sequential extreme learning machine is used as the base classifier in the proposed model.Analyzing data distribution characteristics in the aspects of time and space,this paper gives an adaptive dual weighting scheme to tune the weights both at the time level and at the space level.Probability density function is used to calculate the weights at the time level and the incremental probabilistic neural network to calculate the weight at the spatial level.The whole model is updated by using the class distribution of the double weights to balance the current data.The experimental results show that the proposed algorithm has higher G-mean and F-measure,and represents a good robustness.
Keywords/Search Tags:Data Stream, Class Imbalance, Neural Network, Online Sequential Extreme Learning Machine, Data Mining
PDF Full Text Request
Related items