Font Size: a A A

Two Sparse Group Lasso And Its Application To Bioinformation Mining

Posted on:2017-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y D WangFull Text:PDF
GTID:2310330488964593Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
For the advantages of forecast and estimate the unknown parameters in high dimen-sional data, sparse regression model has caused widespread attention in statistics, machine learning, bioinformatics, and other fields. However, it is under the problems of grouping gene selection when we apply the sparse regression model to the bioinformation mining of the complex diseases, complex biological processes, biological interpretability and so on. By combining the network analysis method in systems biology and group lasso method in statistical machine learning, this paper proposes two sparse group lasso models, develops the corresponding fast solving algorithms, and applies it to microarray data classification and gene selection.The main contributions of this paper are as follows:(1) For the binary classification problems of microarray data classification, we pro-pose an adaptive sparse group lasso with weighted gene co-expression network analysis model and develop its corresponding algorithm. The main innovation of this method is by connecting the groups of gene in group lasso with the modules of the weighted gene co-expression networks, the strategy of dividing groups corresponding to biological pathways is proposed. Another innovation is by assessing the gene significance and constructing the weight with biological significance, the strategy of adaptive gene selection is presented. Applying the above model and solving algorithm to the gene expression data of hepa-tocyte proliferation, we screen the groups of genes related to hepatocyte proliferation. Compared with other five models, the proposed model in this paper obtains the highest classification accuracy and the most stable of the gene selection performance.(2) For the multi-class classification problems of microarray data classification, we propose an multinomial sparse overlaping group lasso with weighted gene co-expression network analysis model and develop its solving algorithm. The main innovation of this method is by using weighted gene co-expression network analysis method, multi-class clas-sification high microbial data will be overlapped grouping and a strategy of overlapping grouping is proposed. Applying the above model and solving algorithm to the diagnosis and grouping gene selection of lung cancer, we screen the groups of genes that are highly associated with lung cancer. Compared with other three models, the proposed model in this paper achieves a better classification performance and the most stable of the gene selection performance.
Keywords/Search Tags:Group lasso, weighted gene co-expression network, bioinformation min- ing, microarray data classification, gene selection
PDF Full Text Request
Related items