Font Size: a A A

Research On The Privacy Preserving Classification Algorithm For Stream Data

Posted on:2018-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2348330536979637Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuous development and popularization of cloud computing and information sharing technology,data stream as a new form of data is widely generated in the application fields such as sensor network,Web application service,network traffic monitoring and intrusion detection.The data stream has the real-time,mutation,potential infinite and concept-drifting characteristics,which brought the traditional privacy preserving classification method huge challenges.In this thesis,the privacy preserving in data stream classification mining is taken as the research point,and a more efficient privacy preserving classification algorithm for data stream is designed.The main work is as follows:Firstly,based on the traditional data stream classification algorithms VFDT and VFDTc,a fast decision tree classification algorithm for data stream with continuous attribute based on red-black-tree named as VFDT_RBT(Very Fast Decision Tree Based on Red Black Tree)is designed and implemented.This algorithm uses Red-Black-Tree to improve the efficiency of computing the information gain of continuous attributes,and improves the classification accuracy of the algorithm by using Hoeffding inequality and the principle of allowing repeated occurrence of continuous attributes.The experimental results show that the VFDT_RBT algorithm has the advantages in time efficiency and classification accuracy.Secondly,for the problem of privacy preserving in data stream mining applications,based on VFDT_RBT algorithm,a decision tree based privacy preserving data stream classification algorithm named as PPFDT(Privacy Preserving Fast Decision Tree)is designed and implemented.The PPFDT uses stochastic perturbation technology to implement privacy protection and can fast establish decision tree at the same time.The experimental results show that the PPFDT algorithm not only has the approximate accuracy of VFDT_RBT,but also has higher efficiency.Finally,to meet the real-time processing requirement of the data stream and solve the node load problem of the privacy preserving data stream classification algorithm,the PPFDT algorithm is distributed and parallelized based on the data stream computation platform Storm.A parallel privacy preserving data stream classification algorithm based on fast decision tree named PPFDT_P(Parallelized Privacy Preserving Fast Decision Tree)is designed and implemented on Storm platform.The experimental results show that the PPFDT_P algorithm has high throughput and real-time performance when dealing with large-scale data,and also has good scalability and parallel efficiency.In brief,we have studied the problem of privacy disclosure in data stream classification mining,designed the algorithms and parallelized the algorithm with real time data stream computation platform Storm.The research results have certain theoretical value and good practicability.
Keywords/Search Tags:data stream, classification, privacy preserving, parallelization, Storm
PDF Full Text Request
Related items