Research On Dynamic Measurement Based Data Stream Clustering And Its Applications

Posted on:2022-03-29

Degree:Master

Type:Thesis

Country:China

Candidate:H L Gao

Full Text:PDF

GTID:2518306560992179

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer software and hardware,more and more data stream are generated,such as stock price fluctuations,hot topic recommendations on Weibo,brain waves,etc.Data stream will produce different data distributions over time.The available value information is often implicit in the data distribution.Obtaining useful information from a large amount of data to guide certain human production processes and behaviors has become a hot research direction in the field of data analysis.Clustering is one of the main methods of data mining and analysis.The main purpose of data stream clustering algorithm is to deeply understand the characteristics of the data set.The data stream processed by the clustering algorithm usually have different speeds and structures,and the data stream is unlimited with the development of time,and these uncertain factors will reduce the clustering effect.The existing data stream clustering strategy has certain limitations.It not only requires more predefined parameters,but also cannot handle arbitrary-shaped clusters,the processing speed is slow,and the robustness is not strong.In this paper,to overcome the above-mentioned problems existing in the current data stream clustering algorithm,the main research results are as follows:(1)In order to better deal with clusters of arbitrary shape and solve the problem of too many pre-defined parameters in the data flow density clustering algorithm,this paper proposes a data stream clustering algorithm based on the typical distribution of data.This algorithm uses Typicality Distribution Function(TDF)instead of the traditional probability distribution function to solve the problem that the clustering algorithm requires users to define a large number of predefined parameters to control the density or the radius of the microcluster.It uses the advantages of density calculation to make The algorithm can handle arbitrary-shaped clusters.The algorithm is divided into two parts,micro-cluster and macro-cluster in structure,and is a one-way execution process,which ensures that each instance only needs to be processed once.The experimental results show that there is a good clustering effect on static data and data sets with concept drift.In addition,the algorithm is more robust than some classic algorithms.(2)Aiming at the problems that the current common data stream clustering algorithms cannot quickly deal with data and low accuracy,this paper proposes a grid data stream clustering algorithm.The algorithm structure adopts online/offline mode: in the online phase,the grid algorithm only needs to process the grid unit and does not need to deal with the advantages of the data instance,which reduces the time complexity.Through a multi-dimensional index structure tree,you can quickly retrieve the corresponding grid,improve the processing speed of the algorithm.In addition,the algorithm introduces an attenuation function to accurately reflect the evolution of the data stream and ensure the real-time effectiveness of the algorithm.In the offline stage,the OPTICS algorithm can be used to process clusters of arbitrary shape,which further improves the accuracy of the algorithm.Finally,the proposed algorithm is compared with other algorithms,and the performance of the proposed algorithm is evaluated in different data sets and different parameter settings.Experiments show that the algorithm proposed in this chapter is superior to other algorithms in terms of time efficiency and accuracy on synthetic data sets and real data sets.

Keywords/Search Tags:

Data mining, Data stream, Unsupervised learning, Density clustering, Grid clustering, Concept drift

PDF Full Text Request

Related items

1	A Study Of Density-based Clustering And Drifting-concept Detecting For Data Stream
2	The GC3 framework grid density based clustering for classification of streaming data with concept drift
3	Research On Data Stream Clustering Algorithm Based On Density Grid
4	Density-based Clustering Algorithm On Streaming Data
5	Research On Semi-supervised Classification Of Data Stream Based On Clustering
6	Research On Ensemble Classification Algorithms Of Data Stream Based On Concept Drift
7	The Research Of Clustering Algorithm Based On Data Stream
8	The Research Of Grid-based Parallel Clustering Algorithm And Clustering For Data Stream
9	Research On Classification Algorithms For Imbalanced Data Stream With Concept Drift
10	Research On Data Stream Clustering Algorithm Based On Grid And Density