Font Size: a A A

Regression Analysis Based On Missing Values For Compositional Data

Posted on:2020-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiuFull Text:PDF
GTID:2370330578473088Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
At present,complex multidimensional data has been applied in various scientific fields.Compositional data is one of them.It has special properties and is generally used to study the proportion of each part of the whole under a certain factor.Therefore,compositional data reflects the relative information between data rather than data itself.Initially,the study of composi-tional data was mainly used to explore the proportion of chemical components in rocks.In recent years,it has been widely used in many social,economic and technological problems.In the process of data information acquisition of social survey,it is possible not to get an answer for various reasons.This will result in data loss,which will have an impact on the results of subsequent data analysis.Therefore,before data analysis,we need to solve the problem of data loss.In this paper,regression filling methods for missing values of compositional data are studied,including partial least squares regression and sparse partial least squares regression.(1)A partial least squares regression filling method for missing values of multi-collinear compositional data is proposed.In high-dimensional data processing,the most typical problem is the multiple correlation between independent variables.In many statistical analysis methods,partial least squares can solve this problem well.Because of in the process of modeling,new variables with strong explanatory ability to dependent variables can be screened out,and then the new synthetic variables can be used to re-model.This paper mainly studies the regression filling method for missing values of compositional data.For those compositional data with fewer independent variables and multiple collinearities,a partial least squares filling method is proposed.(2)Sparse partial least squares regression is proposed for missing values of higher-dimensional compositional data.In order to get a new component in partial least squares model,all the initial variables need to be used.In this way,when the number of independent variables is large,it will have a negative impact on the filling results.At the same time,it is also unfavorable to find important predictive variables.Sparse partial least squares can shrink the estimated coefficients and make the small coefficients shrink to zero,so that the corresponding variables can be removed from the model.It is an improvement of partial least squares.Therefore,based on partial least squares method,a sparse partial least squares regression filling method is proposed in this paper.The research results of this paper will further enrich the research on the method of filling missing values of compositional data,and it provides a new solution to the problem of missing data encountered in all walks of life in the future.
Keywords/Search Tags:Compositional data, Missing data, Partial least square regression, Sparse partial least squares regression, High-dimensional data, Regression analysis
PDF Full Text Request
Related items