Font Size: a A A

Research On Histogram Publication In Data Stream Based On Sliding Window

Posted on:2022-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:L MoFull Text:PDF
GTID:2518306743463504Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Information technology has grown rapidly over the past few years,data stream has become increasingly widespread in many network applications.The data stream has a series of characteristics such as fast speed,continuous arrival time,and large total capacity.Therefore,it is generally required that the data stream processing algorithm can obtain the calculation result under the limitation of a single-pass scan.Online processing of these data streams and real-time publishing of statistical information of relevant data will bring great commercial application value.However,the data stream contains a large number of personal privacy information.However,the data stream contains a large amount of personal privacy information.If these data are directly released,it will cause the leakage of users' personal privacy information.We use the following two examples to explain:(1)In the real-time traffic information system analysis,the destination information planned by the individual is exposed;(2)In the medical analysis data set,the patient's disease information is leaked.These data about the personal location and health information are valuable information.Therefore,how to publish data in real-time and hide users' private data in many applications has become an important research problem.Although there are many histogram publishing methods for static data sets at present,they cannot effectively and rapidly process data stream due to the necessity to cache all data of sliding window.Despite the fact that there are some special histogram publishing methods for streaming data,the following two problems can arise if these methods are used in conjunction with sliding window model:(1)It is not considered whether there is a correlation between histogram publishing and sliding window approximate statistics;(2)Without considering the effect of data flow approximation technology on data privacy protection,it only chooses to save the histogram generated by all data in the current window for simple noise processing.In response to the above two problems,this article mainly carry out the work from the following two aspects:1: This paper proposes a differential privacy histogram publishing algorithm HPA-SW(Histogram Publishing Algorithm for Sliding Window Model)based on correlation distance data streams.The algorithm first uses the idea of approximate statistics to divide a sliding window into k sub-blocks,and adjusts the approximate statistical error of the data by adjusting the size of k;then,HPA-SW uses the similarity measurement method to calculate the similarity distance of the statistical data at adjacent moments;finally,optimizing the privacy budget by comparing the difference between similarity distances and thresholds.Theories and experiments have proved that the algorithm efficiently processes the data in the data stream to achieve user-satisfied publishing errors.The HPA-SW algorithm experimentally shows that the data usability is reduced by 50% compared to the best existing algorithm.2: Aiming at the problems of low data usability and privacy leakage in practical applications,this paper proposes an optimized algorithm AHPM-SW(Adaptive Histogram Publishing Method for Sliding Window).The AHPM-SW algorithm first uses the approximate counting method of the data stream to predict the distribution information of the data in the sliding window at the next moment;then we select the appropriate published value by comparing the difference between the estimated value and the true value;finally,the sorted histogram graph interval is clustered and the error of the data in the bucket is optimized.we prove that the AHPM-SW algorithm has certain usability by theory and experiment.The test results on the standard data set show that the AHPM-SW algorithm reduces the average publishing error by about 77% compared with the best existing group-based histogram publishing algorithm.
Keywords/Search Tags:differential privacy, data stream, correlation coefficient, histogram publishing, approximate statistics, adaptive publishing
PDF Full Text Request
Related items