Font Size: a A A

Research On Feature Screening Of Ultra-high Dimensional Longitudinal Data

Posted on:2021-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q DiFull Text:PDF
GTID:2428330647452624Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In actual production and life,ultra-high dimensional data become more and more frequent,such as gene data in disease research,data in economic and financial fields,etc.Although there are many channels to obtain,only a few of the massive data are useful,which also brings difficulties in research.However,the longitudinal data in the ultra-high dimensional data is more difficult to carry out general statistical research due to the characteristics of intra group correlation,so it is of great practical significance to study the feature selection of the ultra-high dimensional longitudinal data.Based on the sparse assumption of high dimensional longitudinal data,the feature selection of generalized linear models is studied in this paper.The details are as follows:The first chapter introduces the background and significance of the research.Through the introduction of domestic and foreign research methods,the main innovation points and main contents of the paper are obtained.In Chapter 2 and Chapter 3,based on the score test of statistical inference and assuming that the truth value of parameters is 0,the indexes are constructed for feature screening on the basis of rank regression coefficient and C statistic respectively,which are denoted as LRSIS and LCSIS.Chapter 4 further considered the intragroup correlation structure of longitudinal data,added the inverse of covariance on the basis of LCSIS,avoided the estimation of working covariance matrix by quadratic inference function,and converted the estimation equation of generalized linear model parameters into indicators for feature screening,denoted as QIFLCSIS.The sure screening properties of these three screening methods are proved theoretically,and it is proved that these methods can select the really important variables with probability 1.Monte Carlo simulation is used to compare the screening effects of the expansion method MSIS and the three methods proposed in this paper under the condition of high dimensional longitudinal data.The fifth chapter carries on the example analysis according to several screening methods proposed in the paper.Samples are sampled through Bootstrap sampling for prediction.The results show that the prediction accuracy of 0.7 can be achieved by selecting 25 variables out of 1080 variables,and the goal of dimension reduction can be achieved.Chapter six summarizes the shortcomings and prospects of several methods mentioned in this paper.The innovation of this paper lies in: 1.Combining score test with index establishment,this idea can be used in most parameter models,greatly enriching the content of ultra-high dimensional longitudinal data screening.2.The index established in this paper are essentially in the rank of variables rather than variables themselves,so they are robust and not affected by Outliers.At the same time,because the three indexes are all the deformation of U statistics,the index results in this paper are simpler and the calculation process is faster.3.The quadratic inference function(QIF)is introduced to estimate the working covariance matrix,which does not need to estimate nuisance parameters,and greatly improves the accuracy of the screening results.
Keywords/Search Tags:ultra-high dimensional longitudinal data, rank regression, score equation, C statistic, quadratic inference function
PDF Full Text Request
Related items