A High Dimensional Data Stream Clustering Algorithm Of Quick Dimension Reduction

Posted on:2017-08-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Chen

Full Text:PDF

GTID:2348330521450533

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Finding valuable information from the data and turning it into organized knowledge becomes more difficult with the explosive growth of data,so the data mining cames into being.As one of the important methods of data mining,clustering analysis is widely used in many fields.With the continuous development of information technology,data flow becomes a new data type,and gradually turns into the mainstream.So the data stream clustering becomes popular and pregnant.This article is the improvement of the data stream clustering algorithm,it is includes exploring the shortcomings of existing algorithms and advantage of the improved algorithms.Mainly analyzed from the following aspects:This article proposes a high dimensional data stream subspace adaptive clustering algorithm based tree for the problem that,the previous high-dimensional data stream subspace clustering algorithm can not automatically adjust for the dynamic changes of the data stream and require multiple scan data flow.The algorithm use improved relative entropy to find relevant regional dimension,and then establishs the corresponding subspace,and clusters in subspace to ensure that different areas correspond to different sub-space.Using relative entropy to find areas is simpler and more natural than GSCDS algorithm.In combination with using tree to preservate division process-related information,and ideological backtracking algorithm,the algorithm implements the adaptive function for high-dimensional data stream subspace clustering,and avoids the embarrassment of needing to be re-run subspace algorithm for each face of new data,At the same time,the attenuation factor of the use of old data also avoids excessive impact on the clustering results.Experimental results show that the algorithm with a smaller time complexity achieved a higher clustering quality.Secondly,this paper based on cluster edge precision data stream clustering algorithm presence is low and the need for multiple scans to achieve grid clustering problem,draw on previous research results,based on an improved network grid data stream clustering algorithm.The algorithm has two improvements: Firstly,in the initial stages of cluster from the inside out,from the point to the surface to achieve a complete method of clustering by one-time scan the grid to solve the original algorithm repeatedly scan the grid caused inefficiencies;secondly connected by looking for the maximum density point collection to minimize noise distinguish marginal areas and useful point,the original algorithm to solve the problem of themissing edge points.Finally,experiments show that the proposed algorithms are improved accuracy improves.

Keywords/Search Tags:

data stream clustering, subspace clustering, grid cluster, Tree, incremental performance, relative entropy, DBSCAN algorithm

PDF Full Text Request

Related items

1	The Research Of Grid-based Parallel Clustering Algorithm And Clustering For Data Stream
2	Research On Adaptive Clustering Algorithm Based On DBSCAN Theory
3	An Incremental Grid Clustering Algorithm Based On Density-dimension-tree
4	Research On Data Stream Clustering Algorithm Based On Sliding Windows And Subspace Partition
5	Research On Dynamic Clustering And Incremental In Data Mining
6	Research On Grid-based MST Data Stream Clustering Algorithm
7	Research On Incremental Clustering Algorithm In Data Mining
8	Research On Data Stram Clustering Algorithm Based On Similarity And Grid Partition Optimization
9	Study On Grid-Based Clustering Algorithms
10	Research Of Data Stream Clustering Methods Based On Grid