Font Size: a A A

Research On Differentially Private Mechanisms In The Utilization Of Crowdsourced Preference Data

Posted on:2021-01-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Q YanFull Text:PDF
GTID:1368330614472332Subject:Information security
Abstract/Summary:PDF Full Text Request
Crowdsourcing is a form of human computation which aims to address the computer-hard tasks by harnessing the capabilities of the distributed knowledge and intelligence from the crowd agents,and the goal is to achieve a good combination of human intelligence and machine intelligence to obtain excellent problem-solving ability.In data management process of a crowdsourcing platform,it is usually necessary to consider the quality control of the completion of crowdsourcing tasks,answer aggregation and task assignment are two important actions.The former refers to a crowdsourcing process in which agents perform a task and provide their own answers which vary in quality,and the crowdsourcing platforms need to ag-gregate these answers efficiently and accurately in order to obtain practical results.The voting mechanisms are a type of implementation method that can be leveraged.The latter refers to the fact that in the crowdsourcing process,due to the different levels of task completion and different task selection preferences of the agents,the crowdsourcing platforms need to efficiently and reasonably match tasks with agents in order to obtain higher-quality crowdsourced answers,and the recommendation mechanisms are a type of implementation method that can be leveraged.Since both actions require the agents to provide their preference data to the crowdsourcing plat-forms,and these preference data often involves sensitive information,the processes of collecting,analyzing,and publishing the results directly by the crowdsourcing platforms pose a significant privacy risk.In recent years,differential privacy(DP)protection theory and its local model have provided solutions for many data processing scenarios,but the research on crowdsourced preference data utilization scenarios is still incomplete.How to effec-tively protect the privacy of agents,preference data while maintaining acceptable data utility is important for crowdsourced data management.In this thesis,we focus on crowdsourcing answer aggregation and task assignment under local differential privacy protection,and propose the following solutions for three types of preference data such as single value,set,and rank:1.For the crowdsourcing answer aggregation scenario under privacy protection and the characteristics of single-value preference data in weighted voting games,a single value aggregation protocol based on the local model of DP is proposed.Previous studies have used cryptographic techniques such as homomorphic encryption to enhance the security of weighted voting,which requires the as-sumption of collusion and the creation of additional trusted data curators.The protocol LDP-WeVote proposed in this thesis enables agents to perturb their vote weight data and vote intention data locally via the GRR mecha-nism or Laplace mechanism,respectively,and the data curator later estimates the voting result based on the perturbed data,which protects the numerical privacy of both types of single-value preference data.We conduct experi-ments on synthetic datasets to observe the performance of the two solutions of the LDP-WeVote protocol on metrics such as the mean square error of the intermediate results and the accuracy of the final voting results.The results show that the solution LDP-WeVote:GRR generally outperforms the solution LDP-WeVote:Lap.2.For the crowdsourcing answer aggregation scenario under privacy protection and the characteristics of rank preference data in rank aggregation,a rank aggregation protocol based on the local model of DP is proposed.Previous studies have proposed solutions based on the central model of DP which does not take into account the assumption of untrusted data curators.The pro-tocol LDP-KwikSort proposed in this thesis enables agents to perturb their rank preference data locally via the RR mechanism or Laplace mechanism,and the data curator later estimates the aggregate ranking based on the per-turbed data,which protects the privacy of the ranking relationship of pairwise alternatives in the preference data.We conduct experiments on real datasets and synthetic datasets generated based on the Mallows model and observe the performance of the two solutions of the LDP-KwikSort protocol and the comparison solutions on metrics such as the average Kendall tau distance for aggregate ranking and the error rate of some solutions on intermediate results.The results show that the solution LDP-KwikSort:RR usually outperforms the solution LDP-KwikSort:Lap,and their performance are approximately opti-mal when setting the number of queries K=?/2 and this theoretical result is verified experimentally.3.For the crowdsourcing task assignment scenario under privacy protection and the characteristics of the set preference data in the set similarity measure-ment,a set similarity estimation protocol based on the local model of DP is proposed,which can assist in the crowdsourcing task assignment based on the recommendation mechanism.Previous studies have proposed solutions based on the central or distributed models of DP coupled with cryptography techniques,without considering the assumption of untrusted data curators.The protocol LDP-MinHash proposed in this thesis enables agents to perturb their set preference data locally via the exponential mechanism or GRR mech-anism to generate MinHash signatures,and the data curator later estimates the Jaccard similarity of the sets based on the perturbed signatures,which protects the privacy of the existence of the set elements in the preference data.We also provide a theoretical analysis of the association between the internal randomness of the MH-JSE algorithm and its differential privacy properties,and propose to characterize this association using the conditional ?-set oper-ation differential privacy definition.We conduct experiments on real datasets and synthetic datasets to observe the performance of the two solutions of the LDP-MinHash protocol and the comparison solution on metrics such as the mean square error and F1 measure of the output Jaccard similarity,as well as the error rate on intermediate results.The results show that the solution LDP-MinHash:GRR generally outperforms the solution LDP-MinHash:Exp.4.For the crowdsourcing task assignment scenario under privacy protection and the characteristics of the rank preference data in the rank similarity measure-ment,a rank similarity estimation protocol based on the local model of DP is proposed,which can assist in the crowdsourcing task assignment based on the recommendation mechanism.Previous studies have proposed solutions based on the noise overlay approach,not based on differential privacy models nor taking into account the assumption of untrusted data curators.The protocol LDP-WTAHash proposed in this thesis enables agents to perturb their rank pref-erence data locally via the Laplace mechanism or GRR mechanism to generate WTAHash signatures,and the data curator later estimates the pairwise-order similarity of the rankings based on the perturbed signatures,which protects the privacy of the sorted position of elements in the preference data.We con-duct experiments on real datasets and synthetic datasets generated based on the Mallows model and observe the performance of the two solutions of the LDP-WTAHash protocol and the comparison solution on metrics such as the mean square error and F1 measure of the output pairwise-order similarity,as well as the error rate on intermediate results.The results show that the solu-tion LDP-WTAHash:GRR generally outperforms the solution LDP-WTAHash:Lap.
Keywords/Search Tags:Crowdsourced data management, answer aggregation, task assign-ment, local model of differential privacy, randomized algorithms
PDF Full Text Request
Related items