Font Size: a A A

Featurs Screening And Selection Under Measurement Error

Posted on:2021-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:X SunFull Text:PDF
GTID:2370330647452625Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In the digital age based on the Internet,the increasing popularity of data computing,data collection,storage and processing technologies and the enhancement of new hardware in modern infrastructures have provided more personalized services to people,while also bringing diverse structures,Large-scale ultra-high-dimensional complex data which sets with high dimensions and strong correlation.The meticulous and comprehensive data also means more redundant information.As the dimension(attribute) of the data increases exponentially with the increase in the amount of data 9),when there are too many variables,the effects of many traditional models can not work well or even the parameters cannot be estimated at all.Especially when the data has measurement errors or outliers,the difficulty of inferring and analyzing the data also increases,which brings huge challenges to policy makers and researchers.The processing of ultra-high-dimensional data is generally based on the assumption of sparsity.This paper proposes a feature screening method MEDCS for ultra-high-dimensional data with measurement error and a variable selection MEM quantile kernel selection likelihood algorithm for high-dimensional data based on the framework of measurement error.The former proposes a method for filtering correlation coefficients based on the distribution function based on the distribution function in the case of ultra-high-dimensional data with additive measurement errors.And from the theoretical research,multiple simulation experiments and text analysis of Sina Weibo three aspects verify the deterministic screening and limited sample properties of MEDCS.It not only solves the difficulty of correcting ultra-high-dimensional data with additive measurement errors,but also processed the problem that some features of covariates have outliers or heavy-tailed distributions.The latter combines the measurement error frame with the quantile nonparametric kernel regression.By adding "pseudo" Gaussian measurement errors to each variable,an estimated nonparametric estimate of the unknown function with a certain measurement error distribution is obtained.The optimization of the objective function of the number of bits makes the covariate with the smallest correlation coefficient get the largest error,and finally achieves the purpose of variable selection and parameter estimation.In addition to the proposed method,this paper also uses Monte Carlo simulation and the real data of PUMA 560 robotic arm to verify the limited sample nature of the MEM quantile kernel estimation selection likelihood method.Finally,it proves that the method also has the Oracle properties required for variable selection.
Keywords/Search Tags:Feature Screening, Addable Measurement Error, Feature Selection, Measurement Error Likelihood, Nonparametric Kernel Regression
PDF Full Text Request
Related items