Font Size: a A A

Research On Decomposition Strategy And Ensemble Classifier Adaptive Learning Of Data Stream

Posted on:2021-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhangFull Text:PDF
GTID:2428330614470734Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of smart devices and networks,computer software and hardware,more and more data streams have been generated,such as e-commerce transaction records,Weibo hot topic recommendations and discussions,and commentary on news events.Data stream data will exist in the form of data sequences and will change over time.These changing data often contain a lot of valuable information,and the process of mining valuable information is called data stream mining.Because data stream mining is usually closely related to practical problems,it has become one of the most popular research fields.Data stream classification is one of the important components in the field of data stream mining.The complexity of classification tasks tends to increase with the number of classification categories when data stream classification is performed,and the huge overlap between data further increases the difficulty of establishing a clear decision boundary.The ensemble classifier is a commonly used classification model in data stream classification.The existing ensemble classifiers mostly adopt the greedy principle when selecting the base classifier,that is,select some classifiers with better classification performance in the base classifier pool to build ensemble classifier.This selection principle often causes the ensemble classifier to fall into the trap of local optimality.To this end,this paper studies the multi-classification problem in the data stream and the selection of the base classifier in the ensemble classifier.The main research work and results are as follows:(1)When using a decomposition strategy to solve the multi-classification problem in data stream classification,the attribute value information or distance information in the neighbors is usually used instead of being used at the same time,which is easy to cause the waste of neighbor information.When dealing with the concept drift in the data stream,the classifier can use both implicit and explicit strategies for adaptive update.Since the implicit strategy adapts slowly and the display strategy is sensitive to noisy data,users can choose the appropriate adaptive update strategy according to the needs of the classification model.(2)In order to better solve the multi-classification problem in the data stream,this paper proposes a distance weighting algorithm based on decomposition strategy.When the algorithm uses the One-Versus-One decomposition strategy to classify and predict the test samples,it not only uses the class attribute value information in the neighbor samples,but also uses the distance information from the neighbor samples to the test samples to predict the class attribute value weighting.The information contained in the nearest neighbors is fully utilized,and the prediction accuracy of the classifier is further improved.(3)Aiming at the selection of the base classifier in the ensemble classifier,this paper combines it with the genetic algorithm,and proposes a novel ensemble classifier.The ensemble classifier can increase the probability of crossover and mutation when the concept drifts in the data stream,thereby generating more excellent and diverse individuals.In addition,the generation of the next generation population in the ensemble classifier includes the excellent individuals in the previous generation population,the new individuals generated by the crossover and mutation among the excellent individuals,and the new individuals generated by the roulette strategy.In this paper,the proposed algorithm is compared with other comparative classification algorithms on real data sets and synthetic data sets.The experimental results show that the proposed algorithm has improved the classification accuracy.It shows that the algorithm proposed in this paper can deal with the problem of data stream classification and has certain practicality.
Keywords/Search Tags:Data streams, Concept Drift, Distance Weighting, Genetic Algorithm, Ensemble Classifier
PDF Full Text Request
Related items