Font Size: a A A

Research And Application Of Distributed Data Stream Clustering Algorithm

Posted on:2018-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:X G WanFull Text:PDF
GTID:2348330536979953Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and services,data stream model comes into the line of sight of data mining.Due to the particularity of the data stream: a large amount of data arrives continuously in a short time,and the data has the tendency of dynamic change with time,how to use the limited storage space to process the data quickly to obtain useful information has brought new challenges to data mining and its application.Based on the data stream model,this thesis studies how to improve the performance of data stream clustering algorithm by improving the existing data stream clustering algorithm and distributing and parallelizing it based on Storm,and how to apply the algorithm.In the aspect of improving clustering accuracy,the thesis designs a data stream clustering algorithm based on centroid distance and density grid(named as CDD-Stream),which improves the D-Stream algorithm from three aspects: algorithm parameter adjustment,grid cluster formation strategy and historical data analysis.The experimental results show that comparing with D-Stream algorithm and NDD-Stream algorithm,CDD-Stream algorithm has better clustering timeliness and higher clustering quality on data stream objects.In the distribution and parallelization aspect,a distributed data stream clustering algorithm DCD-Stream(Distributed Centroid Distance D-Stream)is designed through adopting the parallelization strategy of updating grids in CDD-Stream algorithm.The results of comparative experiment based on Storm show that DCD-Stream algorithm has considerable clustering quality and better clustering timeliness relative to the CDD-Stream algorithm on data stream objects.In the aspect of application of distributed data stream clustering algorithm,in terms of the application of DCD-Stream algorithm in IDS based on Storm,an intrusion detection system based on Storm(S-IDS)model is designed in the thesis.The results of the experiment based on KDD-CUP99 dataset show that the DCD-Stream algorithm has higher accuracy and better timeliness than the D-Stream algorithm,so the distribution,real-time and accuracy of DCD-Stream algorithm in S-IDS system are verified.The research work of this thesis is in line with mainstream research direction of data stream mining,and it has high advanced and practicability.
Keywords/Search Tags:data stream, clustering, distributed, Storm, Intrusion Detection System(IDS)
PDF Full Text Request
Related items