Font Size: a A A

Research On Adaptive Learning Strategy And Application For Data Stream In Complex Environment

Posted on:2020-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2428330575494838Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In many applications,such as e-commerce,intrusion monitoring,and the IoT environment monitoring,etc.,a large number of data streams are generated at an alarming rate,which contains rich and valuable information.The extraction,processing and further analysis of these massive data are a hot research topic in the field of data mining.Compared with the general data environment,the data stream has the characteristics of high-speed arrival and massive data,and the traditional data mining strategy cannot adapt to its environment.Therefore,in this complex environment,adopting a reasonable learning strategy is the focus in the field of data stream mining.The classification problems in the data stream mainly have the following challenges.First,the diversity of data stream data distribution is diverse and unknown,resulting in loss of classification performance of the classifier;second,the imbalance of category distribution in the data stream further increases the requirements for learning strategies;third,the noise data in the data stream often causes different degrees of interference to the selection strategy of the classification model.This thesis will focus on the above problems and study the adaptive learning strategy of data stream in complex environments.The main work includes:(1)An adaptive ensemble strategy based on depth attribute weighting is proposed to improve the adaptability of data stream environment with noise interference.The strategy uses incremental learning and ensemble learning mechanisms.The classification contribution is locally weighted according to different attribute values,and the dynamic adaptive threshold is designed.At the same time,the dual weighting strategy of classifier confidence and classifier precision weight is combined to further improve the weight distribution of the base classifier.Effectively control noise data or unrelated attribute interference and improve adaptability to concept drift.The experimental verification by synthetic dataset and real environment data shows that the ensemble strategy has certain improvement in classification accuracy and improved ability to adapt to concept drift.(2)An unbalanced learning strategy based on incremental two-layer clustering is proposed to solve the problem of unbalanced data stream classification under concept drift environment.Incremental processing implements rapid clustering for a minority instances,and under-samples majority instances to form a two-layer clustering model,retaining clusters with diversity,while balancing the data of each cluster and reducing storage space.A multi-threshold strategy is introduced to assist the two-layer clustering model for prediction and to detect possible concept drift in order to dynamically update existing models.Experimental results under synthetic data sets and real data show that the unbalanced learning strategy has a certain improvement in classification accuracy and G-mean(preferably about 4%and 7%respectively),and has stronger adaptability to the unbalanced environment with concept drift.
Keywords/Search Tags:Data Stream, Adaptive, Learning Strategy, Concept Drift, Imbalance Learning
PDF Full Text Request
Related items