Research On Clustering Method Of Datastream Based On Grid And Density

Posted on:2018-08-19

Degree:Master

Type:Thesis

Country:China

Candidate:L Wang

Full Text:PDF

GTID:2348330569486434

Subject:Computer Science and Technology

Abstract/Summary:

Data stream clustering analysis can be used in the massive data in real time to dig out the effective information,it has been widely used in the field of business decision-making,Internet of things,financial and securities data analysis and others.Unlike static data,data stream is characterized by real-time,suddenness,volatility,disorder and infinity,and all the data can’t be stored in memory for data analysis.Therefore,cluster analysis on the infinite data stream in the limited memory space is limited,at the same time,as the stream data dimension is getting higher and higher,it brings great difficulties to the clustering of data stream.In view of the above situation,the study of convective data clustering method is of great significance.In this thesis,the data stream is taken as the research object,and the data stream algorithm based on grid and density and the algorithm based on high-dimensional data stream clustering are studied.Specifically,the main work of this thesis is as follows:Firstly,allowing for the problem that existing grid-based and density-based stream data clustering algorithm deal with the boundary points in rough manner,therefore,a kind of improve clustering algorithm based on both grid and density for data stream is proposed.Through introducing the influence coefficient to express the influence of the data points on the adjacent grid cells,the efficiency and accuracy of the algorithm is improved.The experimental results prove that this algorithmic method could not only identify clusters more quickly and accurately but also is feasible.Secondly,a high dimensional data stream clustering algorithm is proposed to solve the problem of high computational complexity.The algorithm takes full account of the flow data dimension itself,and according to the distribution characteristics of each dimension of the data points,the attribute reduction and the grid partition are performed,and the clustering subspace is generated according to the overlapping of different dimensions.The experimental results show that the algorithm can not only guarantee the clustering quality,but also has a better processing efficiency.In this thesis,an improved data stream mining algorithm is proposed to promote the mining efficiency and mining accuracy of current data stream clustering algorithm,and the feasibility and validity of the algorithm are verified by experiments.It is of great theoretical and practical significance.

Keywords/Search Tags:

data stream, divide grid, density, high dimensional data, clustering algorithm

Related items

1	Research On Data Stream Clustering Algorithm Based On Density Grid
2	Research On Clustering Algorithm Over High Dimensional Data Stream Based On Irregular Grid Data
3	Research On Clustering Algorithm Over High Dimensional Data Stream Based On Grid And Sequence Data
4	A High Dimensional Data Stream Clustering Algorithm Of Quick Dimension Reduction
5	Application Of Grid And Density Based Clustering Algorithm In Data Mining
6	Research On Clusrering Algorithm Of High Dimensional Data
7	The Research Of Grid-based Parallel Clustering Algorithm And Clustering For Data Stream
8	Data Stream Clustering Algorithm Based On Active Grid-density
9	The Research On The Algorithms Of Grid-based Data Stream Clustering
10	Research On Data Stream Clustering Algorithm Based On Grid And Density