Font Size: a A A

Robust MOM Classification Algorithm-IM-MOM Via Median Interval For More Information

Posted on:2022-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:F Y HanFull Text:PDF
GTID:2480306776992319Subject:Insurance
Abstract/Summary:PDF Full Text Request
Under the background of big data era,efficient supervised classification learning tasks are widespread.When the data is collected accurately,classical logistic regression,support vector machines and other methods give a good classification scheme.However,in practical applications,outliers often appear due to recording errors,data pollution and other reasons.Noting that traditional methods tend to perform poorly when dealing with data containing outliers,so exploring robust classification learning methods is crucial in statistical learning.Focusing on the problem of robust classification,based on the general MOM method,this paper proposes the Im-MOM method by determining the appropriate median interval to achieve sample data screening.In the robust classification learning problem,the MOM algorithm has been proved to be a robust learning algorithm? it is noted that when using the MOM method,in order to balance the robustness,the number of bins is generally set to be larger,but some experiments have found that the number of bins is larger.When the number of boxes exceeds a certain level,the robustness is basically unchanged and the accuracy is reduced,which also inspires this paper to maximize the prediction accuracy of the model under the condition of a certain robustness.Based on the above problems,the core point of the Im-MOM method proposed in this paper is to use the median interval to select more sample data to maximize the use of effective data information.For the Im-MOM method proposed in this paper,this paper first consid-ers a natural measure of estimation accuracy in statistical learning theory—excess risk.Under the condition of a certain number of outliers,it is proved that the empirical loss exceeds the upper bound of risk.Secondly,the theoretical computational complexity of Im-MOM and MOM methods are given respectively.Finally,in order to ensure the convergence of the Im-MOM algorithm,this paper also discusses the convergence of the Im-MOM gradient descent algorithm.In the numerical simulation part,this paper firstly simulates based on the toy data set,and applies the Im-MOM method to the empirical risk function constructed by var-ious loss functions.It is verified that the Im-MOM method can effectively resist the in-terference of outliers on model learning compared with the general MOM method,and the classification prediction results are more robust and accurate? And in terms of esti-mated parameters,the estimated parameter variance obtained by the Im-MOM method is smaller,and the empirical mean of the estimated parameters is closer to the real pa-rameters.At the same time,this paper applies the Im-MOM method to the HTRU2 data set,and the stability and prediction accuracy of the algorithm are also significantly improved.Secondly,in order to test the effectiveness of the Im-MOM method com-pared with the MOM method under different outlier ratios,experiments with various outlier ratios were simulated based on different losses.Robust and accuracy have been improved.Finally,in order to verify the generalization of the method for the causes of outliers,this paper uses a variety of outlier generation rules to conduct simulation experiments.It can be seen that the Im-MOM method can effectively reduce the interference of outliers and significantly improve the prediction accuracy.
Keywords/Search Tags:Robust Statistic, Classification Learning, MOM Algorithm, Outlier Processing, Median Interval
PDF Full Text Request
Related items