Font Size: a A A

Selection And Application Of High-Dimensional Complex Group Variables

Posted on:2021-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:X M ZhongFull Text:PDF
GTID:2480306110964429Subject:Statistics
Abstract/Summary:PDF Full Text Request
High-dimensional data is a sign of the era of big data.The characteristic of high-dimensional data is that the number of data dimensions is much larger than the sample size.Various scientific fields are full of high-dimensional complex data.In actual research and application,the data is complicated and the data volume is large and the structure is diverse.Some variables have complex group characteristics,such as gene sequence data,satellite data,financial data,etc.Therefore,the effective selection of the corresponding group variable is the premise of accurate data analysis.This article analyzes and researches the method of complex group variable selection and its theoretical properties and practical applications in Logistic and other models.The specific research contents and results are:(1)The logistic model is widely used in complex group variables.The group MCP method is applied to the logistic model.The theory proves the Oracle nature of the group MCP under regular conditions.The numerical simulation of this method is compared with the group Lasso method.The results show that the group MCP method has higher screening accuracy in the selection of complex group variables,which reflects the excellent nature of group variable selection.(2)The selection of complex group variables usually adopts the form of a penalty function.The research compares the basic principles and algorithms ofdifferent group variable selection.The results show that the Composite MCP group penalty method is superior to the other three groups in terms of predictability and variable selection.Penalty method.The four group variable selection methods are applied to the advertising data of the sales network office software company.It is verified that the Composite MCP method is the best in advertising conversion research,and the group structure and single variables that affect advertising conversion are selected through comparison.To provide a reasonable basis for choosing an effective advertising strategy.(3)The traditional high-dimensional group variable selection method may cause problems such as low accuracy and weak algorithm stability when processing ultra-high-dimensional data.At this time,it is necessary to reduce the ultra-high-dimensional data to the general high-dimensional data and reuse it.Group MCP and other methods for variable selection.Considering that the ultra-high dimensional data often have the characteristics of group structure,the ultra-high dimensional screening method under the linear model is studied and extended to the additive model.
Keywords/Search Tags:high-dimensional data, complex group variable, logistic model, MCP method, ultra-high-dimensional screening
PDF Full Text Request
Related items