Font Size: a A A

Online Logistic-SCAD Regression And Privacy Protection

Posted on:2023-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:C C HanFull Text:PDF
GTID:2568306845454294Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and information technology,the digital economy composed of data elements is setting off a boom.It not only relies on the collection,storage and analysis of mass data,but also has the characteristics of fast update and real-time service.Although traditional machine learning algorithms are widely used in modern data analysis,their excellent performances depend on complete data sets and offline environment.When data is rapidly updated,it takes time and memory to constantly training.Therefore,it cannot meet the needs of practical application under the background of digital economy.In recent years,with the development and research of online learning,online learning algorithms have shown excellent performances in real-time data.It makes up for the inability of offline learning to process streaming data and has been widely used in search recommendation and other fields.At the same time,when companies collect and use sensitive data such as medical information and personal preferences,there is a risk of user privacy disclosure.Based on the above two points,this thesis focuses on the research of online logistic regression and its privacy protection.Specifically,it includes the following two parts:(1)In view of how to quickly and efficiently carry out online learning on large quantities of streaming data,we conduct a research on online LogisticSCAD regression,and propose an online logistic regression algorithm with SCAD penalty.Through combining the online gradient descent method and the truncated gradient method,the effectiveness and sparsity of the algorithm are realized.Furthermore,the estimation of the regret bound of the algorithm is given,and the effectiveness of the algorithm is proved theoretically.Experiments show that the proposed algorithm has a better classification ability than other two online sparse algorithms.(2)Aiming at how to reduce the risk of privacy disclosure in the process of online learning,we study online Logistic-SCAD regression based on differential privacy technology,and propose the DP-OGDT algorithm.The data privacy is protected by adding Gaussian noise to the gradient.Through using the theoretical properties of online learning,the expected regret bound of the algorithm is given,and it is proved that the algorithm has good theoretical properties.Finally,the availability of DP-OGDT algorithm under different privacy budgets is showed in experiments.
Keywords/Search Tags:Online learning, Differential privacy, Logistic regression, SCAD, Sparsity
PDF Full Text Request
Related items