Font Size: a A A

Research On Association Rules Mining In Data Streams And Its Application

Posted on:2012-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:P ChenFull Text:PDF
GTID:2178330332978579Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Data mining is an important method for data processing, and it is a challenging field to apply it to the data stream environments. Data mining technique in data streams can be applied in many fields such as intrusion detection, sensor networks and telecommunication industries, so it possesses the pratical significance to carry out this research.The traditional association rules mining algorithms only take discrete attributes in the datasets into consideration, and ignore numerical attributes. It is one of the main bottlenecks that restricts the association rules mining algorithms for the real applications. In static datasets, some researchers put forward with discretization and fuzzy methods to add numerical attributes into the association rules mining process. However, few or even no literatures study or discuss this problem in the data stream environments. This thesis gives a study on this problem, and the fuzzy association rules mining method and the Real-time Data Mining System Based on Fuzzy Association Rules(RDMS-FAR) are obtained. The main research results can be listed as follows,1. For the dynamic property of data streams, the membership function offset index (MFB_measure) is proposed to measure the compatibility of the current membership functions to the current data. Experimental results show that MFB_measure index can effectively capture the changes in data streams.2. In consideration that the traditional association rules mining algorithms in data streams can't add the numerical attributes into the mining process, a fuzzy association rules mining algorithm called FFI-Stream is proposed. The algorithm utilizes the MFB_measure index to monitor the membership function compability degree and update them timely using a clustering algorithm in data streams. Experimental results show that the FFI-Stream has a good performance.3. In consideration of the poor performance when FFI-Stream deals with the datasets with high dimensional numerical attributes, a novel fuzzy association rules mining algorithm based on genetic algorithm and called GA-FFI-Stream is proposed. GA-FFI-Stream maintains a synopsis structure dynamically, so that it adapts to the characteristic of data streams that resources such as memory and CPU are limited. Meanwhile, GA-FFI-Stream utilizes heuristic information to improve the efficiency of the method which is based on GA(Genetic Algorithm) to extract the membership functions. Experimental results show that this algorithm can overcome the poor performance problem of FFI-Stream algorithm for the data streams with high-dimensional.4. According to the demands of data mining technique in data streams in the real projects, a Real-time Data Mining System Based on Fuzzy Association Rules(RDMS-FAR) is proposed. This system is based on the fuzzy association rules mining module and derives a classification module. The fuzzy association rules mining module is based on the FFI-Stream and GA-FFI-Stream's framework. Meanwhile, a boosting algorithm called ruleboost is proposed in the classification module and the ruleboost's base classifiers are based on the fuzzy association rules. Experimental results show that the system is effective.At last, the thesis is concluded with a summary and the problems which need further research in the future are also discussed.
Keywords/Search Tags:Data Stream, Data Mining, Fuzzy Association Rules Mining, Membership Function, Clustering Algorithm, Genetic Algorithm, Real-time Data Mining System
PDF Full Text Request
Related items