Font Size: a A A

Research On Mining Algorithms Over Data Streams

Posted on:2009-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ChenFull Text:PDF
GTID:2178360242976764Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer software and hardware, people acquire huge amount of data in different fields. These data are accumulated in the form of data streams. Data streams ought to be handled at real time, and historical data should only be stored in the form of abstract synopsis information because of their continuous, rapid, transitory and unpredictable feature. Therefore, traditional data mining algorithms can not be used to cope with data streams. And new mining algorithms over data streams have become current research difficulty and hotspot.Current research achievements of data streams algorithms are analyzed and summarized, and a general model over evolutional data streams based on sliding windows containing sub-windows is presented. Based on this model, classification, frequent pattern mining and clustering algorithms are implemented separately. The detail research achievements are listed as follows:1. Current classification algorithms over data streams are analyzed. The SADT algorithm with better flexibility is designed and implemented by integrating the advantages of the popular CVFDT algorithm and Weighted Classifier Ensembles algorithm. The new algorithm analyzes the influent data, decides to alter the current classifier or reconstruct the classifier and solve the problem of concept drift accordingly. Then the flexibility of the SADT algorithm is examined by experiment results.2. Popular frequent pattern mining algorithms are summarized. In view of the low space efficiency and thick time granularity shown in the classic FP-Stream algorithm, an improved algorithm called DSCFPM algorithm mining and storing closed frequent patterns is designed and implemented. This new algorithm shows fine space efficiency and good scalability.3. Recent data streams clustering research status is illustrated. The new DSWStream algorithm based on density is designed and implemented on the base of the online and offline model proposed in the CluStream algorithm. The new method does not need to identify the number of the clusters, can find clusters in any shape and shows good ability in dealing with outliers and noises. The experiments show better clustering quality and efficiency when using DSWStream method.In summary, classification, frequent pattern mining and clustering algorithms are implemented using the sliding windows model containing sub-windows. Compared with other popular mining algorithms, the algorithms presented take the consumed space, mining speed and mining quality into account comprehensively, and show better applicability and flexibility.
Keywords/Search Tags:Data Mining, Data Streams, Classification, Frequent Pattern, Clustering, Sliding Window, Concept Drift
PDF Full Text Request
Related items