Font Size: a A A

A High Dimensional Data Stream Clustering Algorithm Of Quick Dimension Reduction

Posted on:2017-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ChenFull Text:PDF
GTID:2348330521450533Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Finding valuable information from the data and turning it into organized knowledge becomes more difficult with the explosive growth of data,so the data mining cames into being.As one of the important methods of data mining,clustering analysis is widely used in many fields.With the continuous development of information technology,data flow becomes a new data type,and gradually turns into the mainstream.So the data stream clustering becomes popular and pregnant.This article is the improvement of the data stream clustering algorithm,it is includes exploring the shortcomings of existing algorithms and advantage of the improved algorithms.Mainly analyzed from the following aspects:This article proposes a high dimensional data stream subspace adaptive clustering algorithm based tree for the problem that,the previous high-dimensional data stream subspace clustering algorithm can not automatically adjust for the dynamic changes of the data stream and require multiple scan data flow.The algorithm use improved relative entropy to find relevant regional dimension,and then establishs the corresponding subspace,and clusters in subspace to ensure that different areas correspond to different sub-space.Using relative entropy to find areas is simpler and more natural than GSCDS algorithm.In combination with using tree to preservate division process-related information,and ideological backtracking algorithm,the algorithm implements the adaptive function for high-dimensional data stream subspace clustering,and avoids the embarrassment of needing to be re-run subspace algorithm for each face of new data,At the same time,the attenuation factor of the use of old data also avoids excessive impact on the clustering results.Experimental results show that the algorithm with a smaller time complexity achieved a higher clustering quality.Secondly,this paper based on cluster edge precision data stream clustering algorithm presence is low and the need for multiple scans to achieve grid clustering problem,draw on previous research results,based on an improved network grid data stream clustering algorithm.The algorithm has two improvements: Firstly,in the initial stages of cluster from the inside out,from the point to the surface to achieve a complete method of clustering by one-time scan the grid to solve the original algorithm repeatedly scan the grid caused inefficiencies;secondly connected by looking for the maximum density point collection to minimize noise distinguish marginal areas and useful point,the original algorithm to solve the problem of themissing edge points.Finally,experiments show that the proposed algorithms are improved accuracy improves.
Keywords/Search Tags:data stream clustering, subspace clustering, grid cluster, Tree, incremental performance, relative entropy, DBSCAN algorithm
PDF Full Text Request
Related items