Font Size: a A A

Research On Hierarchical Compression Of Data Stream Based On Synchronization

Posted on:2018-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y TanFull Text:PDF
GTID:2348330512984879Subject:Engineering
Abstract/Summary:PDF Full Text Request
Currently,data stream become the main data formats of massive data,it is ubiquitous in sensor data,social media data and traffic trajectory data.Mining and analyzing data stream efficiently has become a research focus.Data stream with real-time,massive,continuous and evolution characteristics,are much more difficult to mine and analyse than traditional static data.Confronted with the massive amount of real-time data,the performance of mining task is constrained by limited time and space source.Compressing data effectively becomes a key problem.Conventional data compression methods include sampling,projection,dimensional reduction and clustering.Almost all of those methods concentrate on the massive and real-time properties,and fail to capture the evolution of data stream.It becomes key issue to clustering and compressing on concept-drifting data streams,and has the theoretical as well as practical significance.In light of the drawbacks and problems of traditionary data compression algorithms,our work focus on task of data compression based on synchronization clustering on concept-drifting data streams.Specifically,for the evolution of data stream,we propose a synchronization clustering method on concept-drifting data stream.We introduce the concepts of micro-cluster based on synchronization clustering,which can not only automatically save the intrinsic structures of clusters,but also support the clustering on different scale.Based on synchronization clustering method,we propose a data compression algorithm.Thanks to the advantages of synchronization clustering,our compression algorithm can also support a wide-range compression ratio,and reconstruct data from compressed data using parameters estimation.Our model comprises of two parts,synchronization clustering algorithm and compression algorithm based on it.The main contributions are listed below.Firstly,we propose a synchronization-based clustering model on concept-drifting data stream.Different from the traditional clustering methods based on cluster features,our algorithm can obtain a compact encapsulation of clusters and high-quality clustering result duo to the merits of proposed SyncTree structure.More importantly,the microcluster can deal with clusters with different shapes,and trace and descript the clustering evolution effectively.Besides,for the requirement of scalable store capacity,SyncTree support clustering on different scale.The Experiments demonstrate the performance of SyncTree is superior to prevailing clustering algorithms,especially with well capacity of processing data stream evolving.Secondly,we propose a data compression algorithm based on synchronization clustering.Different from the traditional compression algorithm,our model can preserve the intrinsic structural information.The data are compressed in the tree from distributed from dense area to sparse area,the compressed resolution is adjusted automatically according to the requirements of compression.The generative power exponential model can estimate the parameters of model,making reconstruction of data possible.Experiments show our compression algorithm has high compression ratio and low re-construction error.
Keywords/Search Tags:Data Stream Compression, Synchronization-based Clustering, Concept Drift, Parameter Estimation
PDF Full Text Request
Related items