Differential privacy,as a privacy protection model with precise and quantitative parameters of protection strength,ensures the indistinguishability of adjacent datasets by introducing noise.However,a high-strength privacy protection mechanism will inevitably introduce large-scale noise,resulting in the utility of published data cannot be guaranteed.Therefore,it has important theoretical and practical significance to analyze the trade-off relationship between the protection strength of the privacy mechanism and the utility of published data.Based on the of quantitative information flow,this thesis adopts the sanitized dataset publishing model in the non-interactive dataset publishing mechanism and uses mutual information to quantify the information exchange generated among different links in the process of private data publishing,and find the way to quantify the information leakage and the utility of publishing mechanisms.Based on the above quantification,we explored the specific relationship between leakage and utility under different distribution conditions of the original dataset.The main contribution of this work is as follows:(1)A method for quantifying data availability based on mutual information is proposed.Based on the synthetic dataset publishing model in the non-interactive data set publishing mechanism,the differential privacy protection mechanism is compared to a communication channel,and tools such as information entropy and mutual information are used to quantify the information flow between the original data set and the synthetic dataset.A quantitative index of data utility based on mutual information release mechanism is proposed.Through analysis,compared with the utility quantification using the distance function,this utility measurement is theoretically less demanding,more versatile,and more effective.(2)Obtained the amount of information leakage when the release mechanism is a discrete memoryless channel.The use of conditional mutual information defines the information leakage of the synthetic dataset release mechanism under the condition of maximizing the attacker’s background knowledge.At the same time,it is further assumed that the release mechanism is a discrete memoryless channel,and the mathematical form of the information leakage under this assumption is obtained.(3)Based on the above content,the trade-off relationship between the security and utility of the source dataset under different distribution conditions is studied.It is similar to the ratiodistortion function,and we proposed a security-utility function to study the mathematical relationship between the amount of information leakage and the utility under the conditions of independent identical distribution and non-independent identical distribution of the internal entries of the source.At the same time,we used common communication channel,and the optimal release mechanism under specific conditions was obtained.Finally,a simulation experiment is carried out by introducing channel noise with different characteristics into the data set to verify whether the above theoretical results are correct.The experimental results show that they are consistent with the theoretical results. |