Font Size: a A A

Research On Classification Algorithm For Conceptual Drift Data Flow

Posted on:2017-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:P P ZhangFull Text:PDF
GTID:2278330482497635Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Today, the technologies of information and communication are so advanced that many applications, such as meteorological monitoring, network security, e-commerce and so on, have produced a huge data stream, which contains a lot of valuable information. However, the traditional static data mining technology has been unable to adapt to the high-speed, continuous, unlimited, changing data stream. And the research of suitable and efficient data stream mining techniques has gradually become a hot topic in the field of data mining. Classification is an important branch of data mining, and the corresponding data stream classification, but also needs to be further analysis and the research has practical significance. The dynamic and changeable characteristics of data stream makes the implicit goals or rules may change over time or environment, that is, the concept drift occurred. On the other hand, because of the precision equipment, re-sampling, outdated data, privacy and other reasons, the data items of data stream, which are derived from the wireless sensor networks, credit card fraud detection, network monitoring and other large number of real-world applications, often have different degrees of uncertainty and are not known and precise.This paper mainly studies two problems of data stream classification:the concept drift and data uncertainty. In view of the characteristics that the concept appears repeatedly, the history concept and current concept have certain relationship and the mutual transition relationships between concepts in data stream and so on, the ECA-RC algorithm is proposed, which applies ensemble classification theory to process the concept drift in data stream. On the one hand, the algorithm stores the essential information of temporary failure concepts and their corresponding base classifiers for later calls instead of deleting them during the learning process. The low frequency history classification information is deleted periodically to avoid the history information occupying too much memory. On the other hand, it predicts the oncoming concept according to transitions between concepts. Therefore, the proposed algorithm can improve the classification accuracy and efficiency.Using traditional data stream classification algorithm to process the uncertain data items in data stream, the result is often disappointing. To efficiently use the uncertain information in data stream, an ensemble classification algorithm for uncertain data stream is proposed, which denotes the uncertain data with an interval and probability distribution function. The algorithm can not only reasonably process the uncertainty in data stream, but also can adapt to the concept drift in an effective way. The experimental results demonstrate the effectiveness and robustness of the proposed algorithm finally.
Keywords/Search Tags:data stream, ensemble classification, concept drift, history concept, repeatability, uncertain data streams, data stream mining
PDF Full Text Request
Related items