Font Size: a A A

Research On Data Stream Clustering Based On Grid And Density

Posted on:2007-01-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:S M DanFull Text:PDF
GTID:1118360212957644Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The age of information technology, charactized by a vast array of data, has enormously amplified the request of making sense of data and made it even more challenging. Data collection anytime and everywhere has become the reality of our lives. In this situation, the limitation of traditional data analyzing methods based on database makes it hard to cope with the huge volume of data. To solve the problem, a new data model referred as Data Stream is introduced. At present, research on data stream mining has attracted more and more attention. Moreover analyzing data stream using clustering is one of the important aspects of data stream mining. It is a significative and challenging work to develop the analysis method on data stream using clustering.The mainly purpose of this thesis is to present an effective method for analyzing data stream based on clustering. To achieve the goal, the work of the thesis is carried out on the basis of researching the existing traditional clustering method and impoving their performance. By studying traditional method, clustering based on grid and density was found to be more suitable to realize data stream clustering. Therefore, the thesis make its research on grid-density based data stream clustering on the basis of improving on the traditional clustering method based on grid and density.To sum up, the main contribution of the thesis is presented below:1. To find the methodoloy which can be used in analyzing data stream, the study on traditional grid-density based clustering method was carried out and a new cell-density computing method was introduced. It was found that counting data points is commonly used to compute the density of cell in most clustering method based on grid and density. The method losts a lot of influence which is made by data points to where the data points reside. The loss of influence makes it more probably to assign data points to different clusters even if the data points were closer than others.To overcome the shortcoming, the concept of Contribution is introduced and a novel method is presented to computing the density of grid cell based on the idea of influence function. The Contribution is the influence gained by the cell nearby the data point which makes the influence. The results of experiments suggest that the new method using Contribution can reduce the loss of the influence.2. Improvements on dense unit judging method and cluster marking procedure were made. Furthermore, hybridization of Clustering based on Grid and Density with Particle swarm optimization (CGDP) was purposed. Single threshold is used in most of the clustering method based on grid and density to judge the Dense Units. This method limits the capability...
Keywords/Search Tags:Clustering Analysis, Data Stream, Particle Swarm Optimizer, Dynamic Environment
PDF Full Text Request
Related items