| With the further advancement of information collecting and data processing technology,streaming data has become an important form of data mining.Its biggest feature is time stamp,so it is also called time-series data.Because of the dynamic and changeable characteristics of streaming data,the concept distribution will change wih time or environment.For example,the change of data source distribution may lead to the difference between the implicit concept knowledge and the original data,which means concept drift.The change of data source generation mechanism may lead to the emergence of new pattern categories in data,which means concept evolution.As to problems of different concept changes under unsteady environment,designing an effective detection algorithm to detect the concept drift and concept evolution is helpful for further research.And it has great s ignificance for the in-depth analyzing and mining of streaming data.This article studies the concept drift and concept evolution detection.The main contents include two aspects:(1)For the issues of the true concept drift caused by the stable change of sample distribution,as well as the false concept drift caused by noise and random fluctuation existing in streaming data,a true and false concept drift detection method based on online performance testing is proposed.The method includes three core modules: monitoring classification performance and capturing effective fluctuation sites,extracting the consistent fluctuations site by grouping cross-tests,and determining the authenticity by the subsequent reference sites of drift.This method uses group cross-testing to analyze the test performance distribution fluctuation of streaming data to eliminate the influence of false concept drift caused by normal random fluctuation and to improve the accuracy of concept drift detection.Secondly,combining with the testing performance change of the subsequent reference sites of the drift,the concept drift site is further distinguished to effectively identify the false or true concept drift.This method can not only accurately detect the concept drift,but also distinguish the false concept drift effectively.(2)For the concept evolution problem caused by the change of pattern class in streaming data,a concept evolution detection algorithm based on completely random forest is proposed.The algorithm mainly concludes three steps: detecting abnormal sample,marking pattern class,and updating model.Firstly,an anomaly sample detector is constructed based on completely random forest,which is used to distinguish normal and abnormal class samples.Then,the k-nearest neighbor strategy is integrated into the process of marking pattern class.The method calculates the similarity between the abnormal class samples and the known pattern class sample,as well as that between the abnormal class samples and the new pattern class sample,to further determine whether the abnormal class samples belong to the novel or known abnormal class.Finally,the model is updated by the results of pattern class analysis,which can improve the detection accuracy of the new pattern class.This method can not only detect the new pattern class in the streaming data timely and accurately,but also ensure that the model can detect new pattern class.This research of thia paper can provide an effective path for the location,analys is,modeling and mining of concept drift and evolution.In addition,it improves the model adaptability when the data distribution and label class change in streaming data,and provides accurate guidance and model guarantee for the analysis and mining of streaming data in unsteady environment. |