Font Size: a A A

Research On Uncertain Data Stream Clustering Method Based On Variable Sliding Window

Posted on:2020-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y GongFull Text:PDF
GTID:2428330575471912Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Uncertain data stream clustering is a method for discovering data item distribution information in uncertain data streams.It can help users to find valuable information in massive data in real time.The fundamental purpose of uncertain data stream clustering technology is to improve the quality of clustering results,filter noise and outdated information and reduce the consumption of space and time resources.In order to obtain high-precision clustering results with less resource consumption,this paper improves the traditional sliding window technology,and based on this,gives a new clustering algorithm.The main research contents are as follows:1.Improved the traditional variable sliding window.In the traditional variable sliding window,the window size setting is not flexible.In this paper,the window size can be adjusted dynamically with the data stream speed changes,and the window size is evenly divided into equal-sized sub-windows,defined as meta windows,and be the basic unit for buffering data.By dynamically setting the probability threshold,the data items in the nearest meta window can be classified while clustering,and the low-probability data will be directly included in the outlier buffer to reduce resource consumption.At the same time,the concept of the amplitude and frequency of the data flow rate change is defined to reasonably set the window adjustment timing.2.Proposed an uncertain data stream clustering algorithm based on variable sliding window.By combining the improved sliding window technology,the paper proposed a new uncertain clustering algorithm VSWC,which modularizes the clustering process,makes the clustering process clearer,and defined new clustering characteristics of uncertain data,to describe the microcluster features more comprehensively.Firstly,in the initialization stage,the algorithm designs a new initial micro-cluster generation method to lay the foundation;Secondly,in the optimal cluster searching stage,the maximum radius,the minimum number of data items and the maximum probability density increase are considered to find the most suitable micro-clusters for data items;then,use the pyramid time frame to store micro-cluster snapshots,and improve the attenuation function to eliminate outdated data;finally improved the k-means algorithm with uncertainty density,responding to user requests.3.Experiments has been set to evaluate the utility of the algorithm.The utility of the VSWC algorithm was evaluated by setting up multiple sets of experiments in a variety of data streams using KDD CUP99 data sets and manual data sets.Experiments show that the VSWC algorithm has certain advantages in comparison with Umicro algorithm and Emicro algorithm in terms of cluster purity,SSQ,time and memory consumption.The effect is more obvious when data flow changes rapidly both in velocity and frequency.Figure[24]table[5]reference[52]...
Keywords/Search Tags:uncertain data flow, clustering, variable sliding window, meta window, Clustering feature
PDF Full Text Request
Related items