Font Size: a A A

Research On Dynamic Measurement Based Data Stream Clustering And Its Applications

Posted on:2022-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:H L GaoFull Text:PDF
GTID:2518306560992179Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer software and hardware,more and more data stream are generated,such as stock price fluctuations,hot topic recommendations on Weibo,brain waves,etc.Data stream will produce different data distributions over time.The available value information is often implicit in the data distribution.Obtaining useful information from a large amount of data to guide certain human production processes and behaviors has become a hot research direction in the field of data analysis.Clustering is one of the main methods of data mining and analysis.The main purpose of data stream clustering algorithm is to deeply understand the characteristics of the data set.The data stream processed by the clustering algorithm usually have different speeds and structures,and the data stream is unlimited with the development of time,and these uncertain factors will reduce the clustering effect.The existing data stream clustering strategy has certain limitations.It not only requires more predefined parameters,but also cannot handle arbitrary-shaped clusters,the processing speed is slow,and the robustness is not strong.In this paper,to overcome the above-mentioned problems existing in the current data stream clustering algorithm,the main research results are as follows:(1)In order to better deal with clusters of arbitrary shape and solve the problem of too many pre-defined parameters in the data flow density clustering algorithm,this paper proposes a data stream clustering algorithm based on the typical distribution of data.This algorithm uses Typicality Distribution Function(TDF)instead of the traditional probability distribution function to solve the problem that the clustering algorithm requires users to define a large number of predefined parameters to control the density or the radius of the microcluster.It uses the advantages of density calculation to make The algorithm can handle arbitrary-shaped clusters.The algorithm is divided into two parts,micro-cluster and macro-cluster in structure,and is a one-way execution process,which ensures that each instance only needs to be processed once.The experimental results show that there is a good clustering effect on static data and data sets with concept drift.In addition,the algorithm is more robust than some classic algorithms.(2)Aiming at the problems that the current common data stream clustering algorithms cannot quickly deal with data and low accuracy,this paper proposes a grid data stream clustering algorithm.The algorithm structure adopts online/offline mode: in the online phase,the grid algorithm only needs to process the grid unit and does not need to deal with the advantages of the data instance,which reduces the time complexity.Through a multi-dimensional index structure tree,you can quickly retrieve the corresponding grid,improve the processing speed of the algorithm.In addition,the algorithm introduces an attenuation function to accurately reflect the evolution of the data stream and ensure the real-time effectiveness of the algorithm.In the offline stage,the OPTICS algorithm can be used to process clusters of arbitrary shape,which further improves the accuracy of the algorithm.Finally,the proposed algorithm is compared with other algorithms,and the performance of the proposed algorithm is evaluated in different data sets and different parameter settings.Experiments show that the algorithm proposed in this chapter is superior to other algorithms in terms of time efficiency and accuracy on synthetic data sets and real data sets.
Keywords/Search Tags:Data mining, Data stream, Unsupervised learning, Density clustering, Grid clustering, Concept drift
PDF Full Text Request
Related items