Font Size: a A A

Research On High-dimensional Data Stream Clustering Method Based On Feedback Control System

Posted on:2022-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:H H SunFull Text:PDF
GTID:2518306329484654Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The goal of high-dimensional stream data clustering is to continuously cluster the objects in the stream so as to receive the clustering analysis results in real time for the convenience of users.At present,there are few studies on clustering problem in high dimensional flow environment.In contrast,the analysis of high-dimensional clustering algorithm based on batch processing model is more perfect.At the same time,many non-high-dimensional stream clustering algorithms must use a large number of parameters.High-dimensional stream data clustering also has the problem of low efficiency.In order to solve the above problems,this paper proposes a new data stream clustering algorithm based on feedback control and adaptive parameters,called FBStream.This method is a real-time unsupervised parameter parallel adaptive stream clustering algorithm,which can effectively avoid the distortion of results caused by lack of prior knowledge.FBStream puts forward dimension-reduction algorithm,stream clustering algorithm and feedback strategy.These methods guarantee the steady stream dimensality reduction of high-dimensional data and can achieve excellent performance compared with the existing methods in distributed stream processing architecture.FBStream first uses Window Principal Components Analysis(WPCA)for feature extraction.Then,the clustering analysis results are obtained by the Feedback Stream Cluster(FBSCluster)algorithm,and the data summaries are saved at the same time.Finally,using the Feedback Control(FBC)stage,the clustering analysis results are fed back to the clustering process iteratively,so that the system can adjust the internal parameters in the pipeline in real time and improve the effect of feature extraction and clustering.The main work contents of this paper include:(1)The WPCA algorithm is proposed.The core of this method is to analyze the angular similarity and dynamically adjust the data stream,thus improving the projection axis offset caused by simple PCA method or SVD decomposition.Compared with the feature extraction method in batch processing mode,the principal component analysis method in WPCA window improves the efficiency of data iteration.(2)FBSCluster algorithm is proposed.According to the data storage characteristics of the stream processing system,the data stream clustering process is separated into two different stages.In this method,two iterative methods of clustering model,window clustering and clustering,are used to enhance the clustering effect.The increase and iteration of data will reduce the influence caused by "dimension curse" and small sample deviation.(3)FBC algorithm is proposed,which makes FBSTREAM algorithm constitute a closedloop system.In this stage,the algorithm will analyze the clustering status in real time and analyze the clustering index.Then the real-time analysis results will be sent to the upstream via feedback signals.Finally,the experimental design and result analysis of FBStream algorithm are provided.
Keywords/Search Tags:Feedback control, FBStream, Dimensionality reduction, High-dimensional data stream, Data stream clustering
PDF Full Text Request
Related items