Font Size: a A A

Differential Private Histogram Publication For Data Stream

Posted on:2017-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiuFull Text:PDF
GTID:2308330503453789Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of information technology and Internet technology, and in the processing of information sharing and information dissemination, the cases of user’s information leakage have occurred, which make people pay attention to privacy protection. In the era of big data, the data needs to be shared, which can be more fully play out the data’s potential value, and therefore the privacy protection of individuals, enterprises and institutions has becoming increasingly important. Currently, as a strict model of privacy protection, the differential privacy has caused much concern and research in many fields. Supposed the attacker has maximized background knowledge conditions, by adding some small noise disturbance on the original data set, so as to achieve the purpose of protecting user’s privacy, which has the advantages of adding less noise and low-risk of information leakage rate.Existing technology of differential privacy publication is mainly based on static data sets and binary data stream set, while relative to the real-world applications that the data arrives in the form of stream and the data has the characteristics of diversity, but the actual operational and the original existing methods are not ideal. Based on this, for the non-uniform distributed numerical stream, the paper presents an efficient differential privacy histogram publication algorithm.Firstly, the relevant contents of privacy and the differential privacy has been described in detail, and then, the data stream processing model and histogram related technology has been explain, and the cloud platform about Spark Streaming framework has been explored, which mainly concerned about the framework’s technology of stream processing and batch processing.Secondly, this paper presents a dynamic differential private histogram publication algorithm, called DDPA, which is oriented toward non-uniform distributed numerical stream. Basing on the sliding window model, the similarity on two adjacent timestamps of data distribution was applied to allocate the budget privacy dynamically, which makes each window’s total budge not exceed the privacy budge ?,and after that, the grouping and merging strategy was used to calculate the local optimal histogram quickly. According to comparing the published data’s availability of the algorithm with the other similar algorithms, the experimental results shows that the algorithm which is proposed in the paper is effective and feasible.Then, based on the cluster of Spark, and the Spark Streaming framework, we will change and improve the method of the EMD similarity in the dynamic differential privacy histogram publication algorithm,called DDPA, so that it can be applied to the Spark cloud computing platform, and to meet the needs of big data stream processing. From the perspective of big data, based on the cluster of Spark platform, and combines the streaming processing and the performance of batch processing of Spark Streaming, to achieve the dynamic differential privacy histogram publication.Finally, Compared the published data’s availability of the improved algorithm with other similar algorithms, the experimental results show that the algorithm is effective and feasible. From the perspective of large data applications, improve the practical value difference histogram data privacy stream released, which has some reference value.
Keywords/Search Tags:data stream, differential private, histogram publication, Spark Streaming
PDF Full Text Request
Related items