Font Size: a A A

Data Stream Clustering Algorithm Based On Active Grid-density

Posted on:2012-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:W X ZhuFull Text:PDF
GTID:2218330368982951Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, more and more data are influencing people's lives as the stream form every day. The mining algorithm of data stream can only visit data once or limited time, because of the characteristics of data stream-- continuous, potential unlimitedness, fast liquidity. For the property, it is difficult for traditional clustering algorithm to meet the demand. So, it is necessary to find a clustering algorithm that fits for the property of data stream. The clustering algorithm for data stream plays the role of importance, and attracts the attention of many domestic and foreign researchers.One target of this thesis is to study and realize a data stream clustering algorithm with low complexity and high accuracy. First, we analyses and study the theories and techniques of data stream mining; Then, we analyses clustering and summarize the merit and demerit,field of application for several typical traditional clustering algorithms. Finally, we educe the clustering algorithm for data stream. A new data stream clustering algorithm based on active grid-density is proposed to solve problems in this paper. First, the data space grid is divided into a grid structure formed by a number of small cube grid, and data stream is mapped to this structure; Then, apply the concept of density to form the concept of grid density, judge the grid density according to eigenvector; and use technology of density decay to capture aynamics of data stream, then extract the boundary point of grid to delete. Besides, this paper uses the concept of activity to judge the activity of the grid density, neglects the inactive grid density, preserves the active gird density for clustering, and compares with CluStream. At last, we us the algorithm in this paper into network intrusion detection system to analysis this algorithm on both detection rate and error rate, judging if this algorithm is feasible.Experiment shows that the algorithm can find arbitrary clusters in shape, and the technology of density decay it uses can delete noise data effetely. Compared with CluStream, the algorithm in this paper has improved on both time complexity and accuracy. Besides, using the algorithm into Network Intrusion Detection System proves that it has higher clustering result.
Keywords/Search Tags:Data Stream, Clustering, Activity, Grid-Density
PDF Full Text Request
Related items