Font Size: a A A

Research Of Optimized Clustering Algorithms Over Data Streams

Posted on:2011-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:B L CaiFull Text:PDF
GTID:2178330338491279Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, clustering algorithms for data streams have been extensively studied, but there are still many issues to be researched and resolved. Most existing grid-based stream clustering algorithms are lack of effective storage structure for grid cells. And they are incompetent to accurately cluster the data points of cluster edge. Noise points existing in data streams also can not be handled effectively. Simultaneously, sequence data exists in data streams, but existing algorithms can not commendably measure the overall similarity of sequences, which reduces the clutering quality. The solution of these problems has an important influence on optimizing clustering algorithms of stream systems, application and so on.Firstly, an index tree structure Pks-tree is designed to store grid cells. This structure not only stores non-empty grid cells, but also keeps their relative position relationships, so as to improve storage and retrieve efficiency. Based on the Pks-tree, a corresponding clustering algorithm is implemented to get final results by traversing storage structure and marking grid cells with different cluster labels.Secondly, a new approach for clustering evolving data streams is proposed, which is based on grid density and correlation. A new time-based density threshold function is introduced to remove the noise points in real time. Moreover, a novel correlation-based technology is adopted to improve the accuracy of clustering. In the initial stage of the algorithm, the data stream is clustered by grid density, when new data records arriving, the novel pruning strategy is adopted to periodically inspect and remove noise points.Meanwhile, based on grid density and correlation, the generated clusters is dynamically adjusted to capture the changes of a data stream. Lastly, a new method based on clustering technology is presented to detect software vulnerabilities. In this method, a new similarity measure mechanism is designed to direct clustering process. The patterns will be mined by clustering the set of vulnerability sequences. Based on these patterns, vulnerability-pattern-library (VPL) is constructed. Simultaneously, a detecting mechanism based on similarity measure is designed to reduce false positives and prevent false negatives. Vulnerabilities will be analyzed by computing the similarity between suspected vulnerabilities and patterns of VPL.The feasibility and effectiveness of the above proposed algorithms and methods are verified through experiments. Combining with classical algorithms and methods, they are analyzed and compared.
Keywords/Search Tags:Data streams, Clustering, Vulnerability detecting, Index tree, Grid density, Correlation
PDF Full Text Request
Related items