Font Size: a A A

Research On Classification And Regression Algorithms On Concept Drifting Data Streams And Its Application

Posted on:2020-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:D Z ChenFull Text:PDF
GTID:2428330596476710Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the digital earth,more and more sensors are generating huge data streams all the time.These data streams contain a lot of information that is valuable for industry and people's life.Quickly and effectively mining potential pattern information from these data is the main purpose of data mining.Classification and regression are two main tasks in the field of data mining.Due to the high speed,continuity,potential infinity and time evolution of the data streams,traditional static data mining algorithms are difficult to achieve satisfactory results when processing data streams.How to construct a fast and accurate incremental data stream classification and regression algorithm is a hot issue in the field of data mining.This paper proposes a data stream classification algorithm named Selective Prototype Learning(SPL).The algorithm selects a limited number of the most representative prototypes as the classified training set to represent the concept of the current time of the data stream,and uses the lazy learning method to classify the new coming data.The algorithm updates the weight of the representative prototype online by the error-driven representativeness learning to adapt to the gradual concept drift.By saving the misclassified instances and using the local misclassification monitoring to detect the abrupt concept drift and update the model,SPL can adapt to the abrupt concept drift and get rid of the noise data.In addition,the algorithm compresses the representative prototype set by the Fast Condensed Nearest Neighbor rule(FCNN),which limits the number of representative prototypes and ensures the high efficiency of the algorithm.In this paper,a data stream regression algorithm named Selective Instances Regression(SIR)is proposed.The algorithm selects representative instances as the training set,constructs a model tree based on recursive least squares method with representative instances,and performs online prediction and model update on the new coming data.The algorithm updates the representativeness of the instances online by their prediction performance and spatial relative position to adapt to the gradual concept drift.For the abrupt concept drift,the algorithm monitors the model error rate to detect the concept drift which based on the idea of statistical process control and update the model.In this paper,SIR is applied to the dynamic discharge simulation of the Qingliu River Basin.The result shows that the Nash efficiency coefficient(NSE)reaches 0.77 and the root mean square error(RMSE)is 9.18.Compared with the traditional machine learning method and hydrological model method,the proposed method has better discharge simulation effect,especially for the simulation of discharge peak.This paper proposes a data stream classification algorithm based on selective prototypes and a data stream regression algorithm based on selective instances.Then,the data stream mining algorithm is applied to the discharge simulation prediction in the field of resources and environment,and have achieved good results.This paper has the distinct cross-disciplinary characteristics.The research results show that data stream mining can provide new ideas and methods for discharge dynamic simulation.
Keywords/Search Tags:data stream classification, data stream regression, concept drift, discharge dynamic simulation
PDF Full Text Request
Related items