| With the development and popularization of information technologies such as the Internet,the Internet of Things,and cloud computing,the amount of global data has exploded.In the era of big data,the value of personal information has received increasing attention.While exploring the enormous economic and social benefits contained in the data,how to scientifically balance the development of data value and personal privacy is a key issue that the academic community has continued to study and work hard to solve in recent years.In order to solve the conflict between big data and personal privacy,a feasible way is to establish a data trading platform.The key of data transaction is to set the price of data reasonably.Because of the"information asymmetry" between data providers and collectors,current scholars have assumed that the probability distribution of the degree of privacy attention of individuals is known to data collectors when proposing data trading mechanisms.But in fact,the hypothesis of"probability distribution is known"is very difficult to establish in reality,so it is necessary to study the data transaction mechanism in the case of unknown probability distribution.Firstly,aiming at achieving the balance between data value and privacy security,the interaction between data providers and data collectors in data transaction scenarios is modeled and analyzed,and a user role-based analysis method is proposed.The problems faced by data providers and data collectors in the process of privacy data transaction are analyzed and sorted out respectively.After clarifying transaction scenarios and model requirements,a dynamic pricing model for privacy data based on multi-state selection is designed and proposed.The construction of the model consists of three links:privacy pricing,multi-state selection and time-varying benefits.In order to protect the privacy and security of data providers in the transaction process,the model uses k-anonymity technology to anonymize the purchased data.How to reflect the impact of anonymity processing on data value in the model is one of the key issues to be solved in this model.Secondly,in order to solve the optimal pricing strategy of data transaction with unknown probability distribution,combined with the dynamic pricing model of privacy data based on multi-state selection,this paper proposes PriSARSA and PriQ-learning algorithms based on empirical matrix on the basis of reinforcement learning idea.By using PriSARSA and PriQ-learning algorithms,data collectors can continuously interact with data providers to gain bidding experience and knowledge from the reward signals given,so as to continuously improve the original action strategies to obtain higher returns,and ultimately maximize the cumulative total revenue value.In order to evaluate the perfornance of the proposed pricing strategy algorithm,simulation experiments are carried out in this paper.The simulation results show that the proposed pricing strategy learning algorithm based on empirical matrix can not only solve the pricing decision-making problem with unknown probability distribution,but also make the optimal pricing judgment according to the "empirical matrix" and bring high profits to data collectors.Finally,because the iterative solution process of reinforcement learning is time-consuming,heuristic learning idea can be used to improve the strategy algorithm by adding heuristic functions,so that the learning speed of the algorithm can be accelerated.In this paper,based on empirical matrix pricing strategy algorithm,PriQ-learning algorithm is improved and optimized,and a heuristic function-based pricing strategy algorithm is proposed.The heuristic PriQ-learning algorithm designs and adds the heuristic reward function and heuristic strategy selection function to the original strategy algorithm to guide the pricing behavior of data collectors,accelerate the learning speed of the pricing strategy algorithm and improve the performance of the algorithm.This paper verifies the high performance of the proposed algorithm by simulating and testing a variety of pricing strategy algorithms.The simulation results prove that the heuristic PriQ-learning algorithm can help data collectors obtain higher returns in the limited privacy data transaction process. |