A Robust Prediction Method For High-dimensional Semi-supervised Data Based On Model Averaging

Posted on:2024-03-02

Degree:Master

Type:Thesis

Country:China

Candidate:F Yang

Full Text:PDF

GTID:2530306908983329

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

With the development of big data,the ways of collecting data have diversified and the data structures used for modeling and analysis have become increasingly diverse.When data collection for predictor variables is very costly or data collection is difficult,there will be a situation where only a small fraction of the samples in the data set are labeled samples and the majority of the samples are unlabeled.At this point,the available sample size for supervised learning is very small and most unlabeled samples are not utilized,which motivates us to consider high-dimensional semi-supervised data as a research topic of interest to be applied to new data contexts in conjunction with existing research results.In this paper,we consider this situation and improve supervised learning based models by exploiting information from unlabeled samples.In addition,the fitted function models that can describe the mechanism of data generation are obtained by machine learning and are widely used in economic analysis,biomedical,text,image,etc.In the application domain,one is more concerned with the predictive power of the model.Therefore,there have been many studies on model selection,that is,selecting the optimal model based on its predictive performance.However,the single model chosen for model selection is subject to uncertainty and risks producing undesirable results.To reduce the uncertainty due to model selection in the modeling process and to improve the model prediction performance,some scholars have proposed model averaging methods.As the dimensionality of data grows,many screening methods and model averaging methods have been developed for high-dimensional data.However,existing studies have conducted data analysis in the case of complete cases(i.e.,with labeled data),and less attention has been paid to the case where a large amount of unlabeled data exists.In particular,when the collected data have a large amount of unlabeled data and the amount of labeled data is insufficient,the predictive performance of existing methods will be affected to some extent.How to improve the model prediction performance based on existing model averaging methods using unlabeled data information is an issue worth study.The main work of this paper is to develop a sequential model averaging-based prediction method for robust prediction of high-dimensional semi-supervised data in a semi-supervised framework.It is divided into two steps.First,univariate model averaging is performed using semi-supervised samples(both labeled and unlabeled samples),the weights of the candidate models are determined by the extended BIC criterion,and the candidate model regression coefficients are estimated using semi-supervised data.Sequential model averaging is then performed,with each step updating the response variables with the residuals obtained from the previous regression step.The innovation of this paper has the following points,firstly,most of the existing semi-supervised learning methods are used for classification,while there are relatively few studies related to semi-supervised regression methods.The method proposed in this paper utilizes the information of unlabeled samples to perform regression prediction,i.e.,it is applicable to the case where the response variable is a continuous variable.Secondly,the method in this paper makes it possible to determine each candidate model and its weights for model averaging in a low-dimensional framework by univariate model averaging,thus ensuring the computational feasibility for high-dimensional(even ultra-high-dimensional)regression.Finally,by using a sequential screening procedure for univariate model averaging,this method can effectively adjust the weights and avoid overfitting in the model averaging stage.Simulation experiments are conducted to compare the prediction performance of the proposed method with the commonly used model selection and model averaging methods for high-dimensional regression problems,as well as the prediction performance in the presence of outlier interference and model misspecification,and the proposed method shows a more robust prediction performance.

Keywords/Search Tags:

High-dimensional, Semi-supervised learning, Model averaging, Robust forecasting

PDF Full Text Request

Related items

1	A High-dimensional VAR Model Averaging Method And Application Based On Factor Analysis
2	Research On 3D Left Atrium Segmentation Algorithm Based On Semi-supervised Learning
3	Research On Presentation Learning Methods Of Semi-supervised Network Based On Deep Learning
4	Research And Implementation Of Time Series Classification Based On Semi-supervised Learning
5	Research For Community Detection Algorithm Based On Semi-supervised Learning
6	The Classification Of Quantum Correlations Based On Semi-Supervised Machine Learning
7	Semi-supervised Learning Based Smart Soft Sensor Modeling
8	A Study On High-Resolution Remote Sensing Image Change Detection Method Based On Dual-perspective Change Context And Semi-supervised Learning
9	Research And Application Of Feature Selection Methods In Ultra-high-dimensional Classification Dat
10	Semantic Segmentation Method Of High-resolution Remote Sensing Images Based On Self-supervised Learning