Research On Clustering Technique Based On Self-similarity And Grid Over Data Stream

Posted on:2013-02-17

Degree:Master

Type:Thesis

Country:China

Candidate:X J Qi

Full Text:PDF

GTID:2218330362963215

Subject:Computer software and theory

Abstract/Summary:

In data mining field, the data stream clustering analysis has become an importantresearch. In the algorithm of clustering over data stream, the traditional clusteringalgorithm has its obvious limitations, which promotes researchers to improve the existingdata mining algorithms and propose the new algorithms. In this paper, the fractal theory isused, the clustering algorithms over data stream based on fractal self-similarity areproposed. Meanwhile, the outlier mining algorithm based on fractal clustering is offered.First of all, a data stream clustering algorithm based on the fractal self-similarity isproposed. To get the fractal dimension, the nested grid data structure is applied, thestatistics of data set can be got by scanning data set once. Then, based on the single fractalinfluence degree, the average fractal influence degree is presented. For each cluster, afterone point adding into, the fractal influence degree is not same. According to the fractalinfluence degree, the data points will be dynamically put into the cluster whose fractalinfluence degree is minimal.Secondly, a high dimensional data stream clustering algorithm based on fractal andgrid is offered. In this algorithm, all the high dimensional data will be stored in the girds.While the request of clustering comes, the feature attributes of high dimensional data canbe got by the fractal method. The fractal dimension of all data set is will be treated as thenumber of the feature attributes. Then, the high dimensional data will be projected into thesubspace formed by the feature attributes. Meanwhile, in the subspace, the grid will be asdata unit, the grid unit will be clustered by fractal method.Finally, an outlier mining algorithm based on fractal clustering over data stream isproposed. In this method, the data space will be divided into many non-intersect grids.The grids will be clustered using the grid based algorithm. The adjacent dense grids willbe regarded as clusters. However, the sparse grids, which are not clustered, will be put intothe existing clusters, and then the outlier degree can be calculated according to the fractaldimensions. If the outlier degree of one grid is greater than a predefined threshold, thedata points in sparse grid are outliers. Otherwise, the sparse grid may be developed to thedense grid, so the grid will be added into the cluster which has the minimal outlier degree. The above algorithms are implemented with C++language. Experimental resultsshow that these algorithms proposed in this paper obtain the higher cluster quality than thecurrent ones, and the anticipated results are realized.

Keywords/Search Tags:

data stream, cluster, fractal, grid, outlier

Related items

1	Data Stream Clustering Algorithm Based On Density And Fractal Dimension
2	Research And Application On Outlier Detection Algorithm For High-dimensional Data Stream
3	Data Stream Clustering And Outlier Detection Based On Grid Coupling
4	Research And Implementation Of Data Stream Cluster Based On Density And Grid
5	Clustering Analysis And Outlier Detection Algorithms On Uncertain Data
6	Research On Data Clustering Based On Grid
7	Based Density Data Stream Cluster Mining Algorithm
8	Study Of Cluster Algorithm Of Data Stream
9	Research On Data Stream Clustering Based On Grid And Density
10	Study On An Analysis Method For Cluster-based Outlier