| Federated learning can only complete the training of the global model by exchanging model parameters or intermediate results under the condition that user data does not go out of the local area.It not only ensures user data privacy,but also gives play to data value,and becomes the first choice of machine learning technology in distributed scenarios.However,federated learning itself is data sensitive,and low quality data will affect the performance of the model.With the development of digital technology,the amount of data is increasing and the quality of data is uneven,which makes data quality assessment play an important role in federal learning.At the same time,the efficiency of data set selection has become the common bottleneck of many machine learning models and AI applications.The existing data quality evaluation and selection methods have the problems of insufficient evaluation dimensions,unreliable evaluation results and low selection efficiency.Therefore,credible data quality assessment methods and efficient data selection methods have become the focus of research.The main results of this paper are as follows:(1)A task-related data quality assessment method is proposed.Aiming at the problem that the existing data quality assessment methods lack the task-related dimension and the high score data do not meet the task requirements,resulting in slow convergence and poor effect of model training,this paper proposes a task-related data quality assessment method.In view of the problem of task relevance,this paper proposes a selfapplicable privacy protection intersection algorithm(SAP)to solve the problem of poor applicability of a single privacy protection algorithm caused by the imbalance of user resource allocation in the federated learning scenario,and proposes an evaluation method of statistical homogeneity and content diversity of task relevance based on SAP algorithm,It solves the problem that the existing data quality evaluation methods ignore the task relevance dimension evaluation.The experimental results show that compared with the existing methods,the data quality evaluation method(statistical homogeneity,content diversity)in this paper has more reasonable evaluation results,and the model trained by highdegree dataset has faster convergence speed and better effect.(2)A result verifiable data quality assessment scheme based on blockchain is proposed.Aiming at the problem that the existing data quality evaluation scheme is led by the server through the third-party platform or the evaluation process,which leads to the opaque evaluation process and the unverifiable evaluation results,this paper proposes a blockchain-based result verifiable data quality evaluation scheme.This scheme is based on the decentralized,tamper-proof and traceable characteristics of the blockchain,ensuring that there is no third party in the data quality evaluation process and the calculation is safe and reliable.This paper proposes algorithms such as low-quality data identification and trusted verification,and self-applicable privacy protection intersection verification algorithm,which realize the safe storage and access of data in the data quality evaluation process,and the credibility and verifiability of evaluation results.The experimental results show that the scheme meets the practical requirements in terms of efficiency and cost.(3)An efficient data selection scheme based on task correlation is proposed.Aiming at the problems of redundant calculation and low selection efficiency in the selection process caused by the existing federated learning training data selection scheme by calculating the impact function,statistical homogeneity and other evaluation dimensions,this chapter proposes an efficient data selection scheme based on task similarity.This scheme is based on privacy protection and intersection,calculates the similarity between the new model and the historical model in the evaluation information base,and selects the data that meets the task requirements,reducing the redundant calculation problems caused by repeated evaluation.This paper proposes data selection algorithm based on privacy protection intersection,data selection algorithm based on extended privacy protection intersection,and evaluation selection algorithm based on data quality,which improves the efficiency of data selection.The experimental results show that the scheme in this paper is more efficient than the existing data selection scheme on the premise of ensuring reasonable data selection. |