Font Size: a A A

Research And Application Of CTR Prediction Algorithm Based On Federated Learning

Posted on:2022-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y GongFull Text:PDF
GTID:2518306560992239Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As one of the core technologies of search engines,recommendation systems,and Internet advertising,click-through rate prediction has been a popular research object in both academia and industry.Traditional click-through rate prediction models require data to be collected in one place for training,but from the perspective of privacy protection,it is a better strategy to not collect users' data.Federated learning enables machine learning using participant data without their data leaving their local area.Applying federated learning to the field of click-through rate prediction will help privacy protection in search engines,recommender systems,and the Internet advertising industry.The research goal of this paper is the application of federated learning in the field of click-through rate estimation.The main problems currently exist: the lack of methods for generating non-independent identically distributed datasets in binary classification scenarios,the lack of efficiency and effectiveness of models,and the lack of applications in the field of click-through rate estimation.Based on these problems this paper carries out the following works.(1)A non-independent identically distributed dataset is generated.The data label ratio is proposed as a measure of the degree of independent identical distribution under the binary classification problem,and the dataset is divided into client subsets with a controllable degree of overall non-independent identical distribution based on this index.A heuristic algorithm is designed for solving the NP-hard problem in the client subset generation process,and this method can generate the client subset in a reasonable time.(2)An improved Refined Federated Averaging(RFedAvg)algorithm is proposed.Firstly,to address the problem of unfairness and reduced model effectiveness caused by some clients rarely participating in training due to network communication constraints,this paper simulates two cases of extreme constraints and ordinary constraints,and proposes the strategies of general correction and strict correction from the client selection session,and the experimental results show the effectiveness of the correction strategies.Then,in model aggregation,four inter-distribution distance measures are introduced to federally weighted aggregation of the model,and the inter-distribution distance between the client and the overall data is converted into model weights in the training process.Finally,this paper integrates the above two improvements and proposes the RFedAvg algorithm,and compares it with the current popular models.The experiments finally show that the proposed model converges faster and works better.(3)A prototype federal recommendation system is implemented.A basic federated recommendation prototype system is implemented based on the RFedAvg algorithm.Two key technical issues were solved during the implementation: the inter-call between Python and Java,and the remote call from the central server to the client.The system uses the RFedAvg algorithm as a recall module and sorts the recall results according to the music hotness,which effectively protects user data while implementing the recommendation function.
Keywords/Search Tags:Click-through prediction, Federated learning, Non-independently identically distribution, Client selection, Weighted federated aggregation
PDF Full Text Request
Related items