Font Size: a A A

Density-based And Grid-baed Uncertain Data Stream Clustering Algorithm In Vulnerability Detection

Posted on:2014-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:J T ZhaoFull Text:PDF
GTID:2268330422966859Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid development of information technology brings great convenience to thedaily life, while a large number of software vulnerabilities. These vulnerabilities aremaliciously used, and illegal purposes are achieved by getting important information.Clustering as an unsupervised learning method can detect vulnerabilities efficiently andprevent loss of important information. So far, the amount of information stream is huge inlarge software, processed information is usually imprecise. So, it’s important practicalsignificance that modified uncertain data stream clustering algorithms are applied to solvevulnerability.The current clustering algorithms for evolving uncertain data stream are sensitive touser specified threshold, and unstable in noise processing. In this paper, DUStream ispresented, a density-based algorithm for discovering clusters in evolving uncertain datastream. Probability distance is introduced as a new similarity measure, givingconsideration to probability attribute and distance attribute. Probability Radius is used as aself-adaption dynamic threshold to reduce the effect of user specified input. Theexperimental results demonstrate the effectiveness and efficiency of the algorithm onartificial and real data sets.The existing grid-based uncertain data stream clustering algorithms are fast butlow-accuracy, and sensitive to user-specified threshold. In order to solve the aboveproblems, a density grid-based uncertain data stream clustering algorithm UG-Stream isproposed in this paper. In UG-Stream algorithm, a dynamic threshold is defined by takinguncertainty and grid feature into account, dense grid can be distinguished by the threshold.The probability variance is defined to describe the distribution of internal data points ingrid. If grid distribution can be taken as uniform, dense grid can be classified as coredense grid by probability variance. Core dense grid can be clustered directly, the rest ofdense grids will be merged into current existing clusters by probability center distance.Contrast experiments show that UG-Stream algorithm is superior to UMicro algorithm inboth clustering accuracy and clustering rate. The existing grid-based uncertain data stream clustering algorithms are able toachieve a high clustering rate, but difficult in clustering boundary-grids. The clusteringresults of grid-based algorithms are usually fuzzy in boundary. So far, there are fewclustering boundary-grids algorithms in probabilistic data stream. In this paper, anclustering uncertain data stream boundary-grids algorithm UBGStream is proposed. In thisalgorithm, probability variance is defined to describe distribution of data points in grid.Dense grid is classified as candidate boundary grid, if distribution of data points in densegrid is non-uniform. A spherical structure is made by statistical information of candidateboundary grid. According to the overlapping relationship of spherical structure andcandidate boundary grid, boundary-grid can be accuracy-clustering. Experimental resultsin Synthetic data set and real data set show, UBGStream algorithm is a lot superior toDCUStream algorithm in clustering accuracy.
Keywords/Search Tags:Uncertain data stream, Cluster, Density-based, Grid-based, Probabilitysimiliarity measurement
PDF Full Text Request
Related items