Font Size: a A A

Two Adaptive Sparse Learning Machines And Its Application To High-dimensional Data Mining

Posted on:2018-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:W P DongFull Text:PDF
GTID:2348330515460478Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
With the accumulation of modern high-dimensional data, the traditional statisti-cal learning method, represented by support vector machine, cannot select variable well for high-dimensional data. Developing new type of adaptive sparse learning machine for high-dimensional data mining provides a new idea. In this paper, we combine the s-tatistics, systems biology and information theory methods, develop two adaptive sparse learning model with biological interpretability and solving algorithm, apply them to high-dimensional data analysis, and get the better performance of classification and gene se-lection. The main innovations are as follows:(1) In view of the challenges of the group lasso penalty methods for binary classi-fication high-dimensional data analysis, e.g., dividing variables into groups in advance,adaptive variable selection in each group, biological interpretability, we are committed to carry out variable grouping strategy based on network analysis and the new adaptive punishment mechanism research, propose the adaptive sparse group lasso combined net-work analysis with information theory methods. Firstly, we connect the identified network module in network analysis with variable grouping in group lasso, and use the method of weighted gene co-expression network analysis to identify the module with good biologi-cal interaction. Secondly, we use the method of information theory, such as conditional mutual information, to construct the evaluation criterion of variable significance within each divided group and the weight coefficients with biological significance, and add them to the suitable location in penalty term so that can adaptively select variable. Finally,the results on four high-dimensional cancer data demonstrate that the proposed adaptive sparse learning machine can effectively perform classification and grouped gene selection.(2) In view of the challenges of the group-penalized multinomial regression for multi-classification high-dimensional data analysis, e.g., adaptive variable selection in each group, biological interpretability, we propose the sparse multinomial regression connect-ed with network analysis method. By combing biological resources and gene expression profiles, we use GeneRank to construct the weights with biological significance, then intro-duce them into group lasso penalty, and propose a new adaptive sparse learning machine.Finally, experimental results on yeast diauxic shift demonstrate that the proposed model can achieve the better performances of classification and gene selection compared with other models.
Keywords/Search Tags:High-dimensional data mining, group lasso, weighted gene co-expression network, conditional mutual information, GeneRank, adaptive sparse learning machine
PDF Full Text Request
Related items