Font Size: a A A

Research On Clustering Method Of Datastream Based On Grid And Density

Posted on:2018-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2348330569486434Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data stream clustering analysis can be used in the massive data in real time to dig out the effective information,it has been widely used in the field of business decision-making,Internet of things,financial and securities data analysis and others.Unlike static data,data stream is characterized by real-time,suddenness,volatility,disorder and infinity,and all the data can't be stored in memory for data analysis.Therefore,cluster analysis on the infinite data stream in the limited memory space is limited,at the same time,as the stream data dimension is getting higher and higher,it brings great difficulties to the clustering of data stream.In view of the above situation,the study of convective data clustering method is of great significance.In this thesis,the data stream is taken as the research object,and the data stream algorithm based on grid and density and the algorithm based on high-dimensional data stream clustering are studied.Specifically,the main work of this thesis is as follows:Firstly,allowing for the problem that existing grid-based and density-based stream data clustering algorithm deal with the boundary points in rough manner,therefore,a kind of improve clustering algorithm based on both grid and density for data stream is proposed.Through introducing the influence coefficient to express the influence of the data points on the adjacent grid cells,the efficiency and accuracy of the algorithm is improved.The experimental results prove that this algorithmic method could not only identify clusters more quickly and accurately but also is feasible.Secondly,a high dimensional data stream clustering algorithm is proposed to solve the problem of high computational complexity.The algorithm takes full account of the flow data dimension itself,and according to the distribution characteristics of each dimension of the data points,the attribute reduction and the grid partition are performed,and the clustering subspace is generated according to the overlapping of different dimensions.The experimental results show that the algorithm can not only guarantee the clustering quality,but also has a better processing efficiency.In this thesis,an improved data stream mining algorithm is proposed to promote the mining efficiency and mining accuracy of current data stream clustering algorithm,and the feasibility and validity of the algorithm are verified by experiments.It is of great theoretical and practical significance.
Keywords/Search Tags:data stream, divide grid, density, high dimensional data, clustering algorithm
PDF Full Text Request
Related items