Font Size: a A A

Model Selection For High-Dimensional Multinomial Logistic Regression Models

Posted on:2018-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:X M ZhuFull Text:PDF
GTID:2370330590977834Subject:Statistics
Abstract/Summary:PDF Full Text Request
High dimensional data analysis has become a trending topic in multifarious fields such as genomics,economics,and health sciences.For instance,with the advances in genomics,expression profiles of tens of thousands of genes(features)are simultaneously made available by microarray technology,while only tens of hundreds of samples are studied due to limitations in expenditure and time,which gives us the small-n-large-p situations.Multinomial logistic regression,also known as multiclass logistic regression,is a multiclass classification method that builds a model to predict the probabilities of multiple outcomes.A common theoretical assumption,whereas in accordance with reality,is that only a small portion of features has significant contributions to prediction.With this sparsity assumption,not only can shrinkage improve the statistical accuracy,but can also enhance the model interpretability and vastly reduce the computational complexity.Group lasso is an extension of lasso.Sparse group lasso(SGL)penalty merges lasso and group lasso penalty together,thus resulting in solutions with sparsity within and among groups.Nevertheless,lasso methods need theoretical support for selecting the ”best” subset of features.Hence,in this thesis,combination with the Extended Bayesian Information Criteria(EBIC)for model selection of multinomial logistic regression models is proposed.The model selection consistency is established.The performance of the proposed approach is evaluated by numerical simulations,and is also demonstrated by the analysis of Amazon review author classification data set.
Keywords/Search Tags:high dimensional data, small-n-large-p, multiclass classification, multinomial logistic regression, model selection
PDF Full Text Request
Related items