Font Size: a A A

Data Stream Algorithm With Non-uniform Grid And Its Application In Traceability System

Posted on:2020-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:J T XuFull Text:PDF
GTID:2428330614465629Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the promotion of product traceability system,the amount of traceability data has increased dramatically.Traceability data,as a typical data stream,has great considerable research significance and application value.Data stream mining has become one of the research hotspots in the field of data mining.To obtain high-quality clusters,the rapid processing of data stream in the limited memory has become an important direction of data stream mining.The data stream clustering method based on density grid has high efficiency,which can form clusters with arbitrary shapes.But it has some shortcomings such as difficulty in parameter setting and low clustering precision.In order to overcome the above shortcomings,this thesis improves the data stream clustering algorithm based on density-based grid,designs and implements the parallel algorithm as well,finally applies it to the traceability system.The workload is summarized as follows:(1)Proposing NCD-Stream algorithm based on D-Stream algorithm.Firstly,the NCD-Stream algorithm sets the adaptive parameters by weighted average density,and dynamically adjusts the threshold according to the number of grid clusters.Then,to improve clustering precision,the uneven division approach is employed to the sparse grid which at the boundary of clusters.Meanwhile,it utilizes disjoint-set to optimize the merging of clusters to enhance the efficiency.Besides,the pyramid time model is used to store time snapshots which can provide references for the grid cluster evolution analysis.The experimental comparison results show that NCD-Stream has better performance and higher efficiency then others.(2)Proposing DNCD-Stream algorithm which can adapted to distributed environment based on NCD-Stream algorithm.At first,the whole data space is divided into multiple grid blocks,and the local clustering process has done on these blocks in parallel mode.Then,the status of grids located on the boundary of that block are judged to accomplish global clustering.Through Spark Streaming,experimental results show that with the increasing degree of parallel computation,the efficiency is further improved while the clustering quality is ensured.(3)DNCD-Stream algorithm is applied to traceability system,and the traceability data stream processing system is designed and implemented.
Keywords/Search Tags:Data Stream, Density Grid, Clustering, Distributed, Traceability system
PDF Full Text Request
Related items