Font Size: a A A

Clustering Uncertain Data Based On Similarity And Virtual Grid

Posted on:2013-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:M M CaoFull Text:PDF
GTID:2248330392454738Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the development of data collection and process technologies,uncertain data has attracted a lot of attentions and is widespread in many applications suchas sensor network, biomedicine, military and economy. Most existing clusteringalgorithms of uncertain data just incorporate uncertain factors into the concept of distancebetween objects in the clustering process. The disadvantage of the algorithms is that theyare incompletely taken into account of existential uncertainty and attribute leveluncertainty on clustering results. Furthermore, the clustering algorithms of uncertain datacan not find clusters of arbitrary shapes on uncertain data.Firstly, in order to clustering uncertain data that consists non-numeric attributes, andtaking into account of existential uncertainty and attribute level uncertainty on clusteringresults completely, we will propose a novel algorithm UNClique, Clustering UncertainData Based on Probability Attribute value Similarity. This algorithm defines theprobability attribute value similarity that can deal with numeric attributes and non-numericattributes, and we integrate uncertain factors into it, so it can reflect the existentialuncertainty and attribute level uncertainty on clustering results. We also provide asecondary partition algorithm, if a tuple in a low-density cell has the maximum value ofprobability attribute value similarity with it’s high-density neighbor cell, the method willmerge it into the high-density neighbor cell. And, the clusters are formed of thehigh-density neighbor cells.Secondly, most existing vulnerability taxonomies classify vulnerabilities by theiridiosyncrasies, weaknesses, flaws and faults et al. The disadvantage of the taxonomies isthat the classification standard is not unified and there is overlap classificationphenomenon in vulnerability taxonomy. In order to solve the problem, we will propose analgorithm VUNClique, Virtual Grid-based Clustering of Uncertain Data on vulnerabilitydatabase. We transform the vulnerability database into uncertain dataset using the existingvulnerability database pretreatment model. We use probability attribute value similarity,and define a Virtual grid structure, the cells are divided into real cells and virtual cells, butonly the real cells which contain data objects stored in memory. Then, a novel identify cluster algorithm is provided to cluster the high-density real cells. It can identify clustersof arbitrary shapes by traversing real cells twice.Finally, the experiments are implemented by Microsoft Visual C++6.0with C++programming language and performanced over a number of synthetic and real data sets.
Keywords/Search Tags:uncertain data, attribute value similarity, vulnerability database, virtual grid
PDF Full Text Request
Related items