Font Size: a A A

Research On Clustering Algorithm For Data Stream Based On Density And Constraint

Posted on:2016-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2308330461467279Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As a new field of data mining, the research significance of data stream clustering is very important. Most of traditional data stream clustering algorithms are unsupervised learning process. But many data stream contains priori knowledge and proposes requirement for the result of clustering which must meet certain conditions or not violate a rule. This requirement is known as the constraint condition. If we can make good use of these constraints, we will construct a great semi-supervised clustering algorithms for data stream.We made a research on the clustering of data stream with constraint after we analyzed the characteristics of data stream and the nature of the constraint condition. The article mainly include the following works:Firstly, we studied the existing data stream clustering algorithms, analyzed its core theories and technologies and summarized the advantages and disadvantages of various algorithms. Secondly, we expounded the traditional clustering algorithms based on constraint conditions, including the thought and the specific implementation process of algorithms. We analyzed how to use the constraint conditions to improve the existing clustering algorithms. Then, we proposed a clustering algorithm for data stream which can process instance level constraints—C-DDStream. This algorithm is a data stream clustering algorithm based on density, we divided the clustering process into online part and offline part by the two-phase of data stream clustering framework. We used damped window model in the online part, referenced constraints to transform vast amounts of data objects in data stream to micro clusters and expanded instance level constraints to micro cluster level constraints. Offline part used the micro cluster level constraints to guide the clustering procedure. We considered micro cluster as clustering unit and looked for density connected area to produce clustering results.Finally, we implemented C-DDStream algorithm on the MOA (an Open source machine learning framework), and verified the correctness and validity of the algorithm by experiment tests. The results show that the C-DDStream improved the effect of the data stream clustering through the use of the constraint relations.
Keywords/Search Tags:Data streaming, clustering, density, constraint
PDF Full Text Request
Related items