Research Of Optimized Clustering Algorithms Over Data Streams

Posted on:2011-10-02

Degree:Master

Type:Thesis

Country:China

Candidate:B L Cai

Full Text:PDF

GTID:2178330338491279

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, clustering algorithms for data streams have been extensively studied, but there are still many issues to be researched and resolved. Most existing grid-based stream clustering algorithms are lack of effective storage structure for grid cells. And they are incompetent to accurately cluster the data points of cluster edge. Noise points existing in data streams also can not be handled effectively. Simultaneously, sequence data exists in data streams, but existing algorithms can not commendably measure the overall similarity of sequences, which reduces the clutering quality. The solution of these problems has an important influence on optimizing clustering algorithms of stream systems, application and so on.Firstly, an index tree structure Pks-tree is designed to store grid cells. This structure not only stores non-empty grid cells, but also keeps their relative position relationships, so as to improve storage and retrieve efficiency. Based on the Pks-tree, a corresponding clustering algorithm is implemented to get final results by traversing storage structure and marking grid cells with different cluster labels.Secondly, a new approach for clustering evolving data streams is proposed, which is based on grid density and correlation. A new time-based density threshold function is introduced to remove the noise points in real time. Moreover, a novel correlation-based technology is adopted to improve the accuracy of clustering. In the initial stage of the algorithm, the data stream is clustered by grid density, when new data records arriving, the novel pruning strategy is adopted to periodically inspect and remove noise points.Meanwhile, based on grid density and correlation, the generated clusters is dynamically adjusted to capture the changes of a data stream. Lastly, a new method based on clustering technology is presented to detect software vulnerabilities. In this method, a new similarity measure mechanism is designed to direct clustering process. The patterns will be mined by clustering the set of vulnerability sequences. Based on these patterns, vulnerability-pattern-library (VPL) is constructed. Simultaneously, a detecting mechanism based on similarity measure is designed to reduce false positives and prevent false negatives. Vulnerabilities will be analyzed by computing the similarity between suspected vulnerabilities and patterns of VPL.The feasibility and effectiveness of the above proposed algorithms and methods are verified through experiments. Combining with classical algorithms and methods, they are analyzed and compared.

Keywords/Search Tags:

Data streams, Clustering, Vulnerability detecting, Index tree, Grid density, Correlation

PDF Full Text Request

Related items

1	Research Of Probability Density Grid-based Clustering For Uncertain Data Streams
2	Density Tree-based Clustering Algorithm For Uncertain Data Streams
3	Research On Data Stram Clustering Algorithm Based On Similarity And Grid Partition Optimization
4	An Incremental Grid Clustering Algorithm Based On Density-dimension-tree
5	Research On Density-based Subspace Clustering Algorithm For Data Streams
6	Research On Density-Based Subspace Clustering Algorithm For Data Streams
7	A Density-Based Clustering Algorithm Over Stream Data
8	The Research Of Grid-based Parallel Clustering Algorithm And Clustering For Data Stream
9	Research On Optimization Of Adaptive Density Partition Clustering Algorithm
10	Density-based And Grid-baed Uncertain Data Stream Clustering Algorithm In Vulnerability Detection