Font Size: a A A

Three Group Lasso Regularized Regression Models And Its Application To High-dimensional Data Analysis

Posted on:2021-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:M M ChangFull Text:PDF
GTID:2480306197494254Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
With the development of biomedicine,a lot of microarray data have been produced in the experiment.The "small sample,super-high dimension" characteristic of microarray data brings great challenges to the traditional statistical learning methods.In this paper,we combine clustering in machine learning with sparse regression method,and develop three group lasso regularization regression models with biological interpretability,apply them to microarray data respectively,all of them achieve good performance of classifica-tion and gene selection.The main innovations of this paper are as follows(1)In view of the challenges of the group lasso penalty methods for binary-cancer microarry data analysis,e.g.,robustness,dividing genes into groups in advance,adap-tive variable selection in each group,we propose the robust adaptive sparse group lasso(RARSGL)and develop fast solving algorithm.First,the real data matrix is decomposed by robust principal component analysis.Then,weighted gene co-expression network anal-ysis is used to group genes in advance on tumor data containing only clean information,and more reliable gene groups are obtained.In addition,two weight matrixes are con-structed based on the criteria of gene significance evaluation based on conditional mutual information and noise information respectively,which are introduced into the penalty term to select genes adaptively.The addition of noise information further enhances the robustness of the model.Finally,the results of gene expression data of colon cancer and prostate cancer verify that RASGL has good classification performance and population gene selection performance(2)In view of the challenges of the group lasso penalty methods for multi-cancer microarry data analysis,e.g.,dividing genes into groups in advance,biological inter-pretability,we propose the robust adaptive multinomial regression with group lasso penal-ty(RAMRSGL).By adopting the overlapping clustering strategy,AP clustering is em-ployed to each cancer subtype,which explores the group structure of each cancer subtype and merges the groups of all subtypes.In addition,the data-driven weights based on noise are added to the multiple sparse groups lasso penalty,which is combined with the multinomial log-likelihood function to perform multi-classification and adaptive group gene selection simultaneously.The experimental results on acute leukemia data verify the effectiveness of the proposed method.(3)In order to select the key genes of each cancer subtype at the same time of multi-classification,we propose new grouping strategy and classification and gene selection strategy.The similarity score is introduced to calculate the measurement matrix of AP clustering,which makes these gene groups more biologically significant.After grouping genes,we propose AP clustering-based sparse group lasso(AP-SGL)to construct binary classifiers,and automatically select genes related to classification in groups by adopting'one-versus-rest' strategy.Further,we adopt the voting strategy to integrate the binary classifiers for performing multi-classification.The proposed method achieves improved ac-curacies at minority classes SQ and COID,and selects three possible key pathogenic genes.
Keywords/Search Tags:Sparse group lasso, regression, microarray classification, key gene selection
PDF Full Text Request
Related items