Font Size: a A A

Research On Feature Gene Recognition Method Based On Sparse Matrix Decomposition

Posted on:2016-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2208330464463601Subject:Library science
Abstract/Summary:PDF Full Text Request
With the advances in information technology and in the use of databases, a wide variety of information is exploding, and it is hard for us to find useful information from vast amounts of data. The rapid development of database technology and machine learning makes data mining as a new stage of history data processing techniques. In recent years, various biological genomic studies have been conducted, so that the number of biological experimental data exploding. Data analysis methods in the past have been far from satisfying the needs of the actual study. Sparse matrix decomposition theory as a new generation of data mining techniques can well handle large-scale gene expression data, Further it can identify the feature genes contains critical information from gene expression data, thus providing an effective means and methods for the life sciences to better understand life.In this paper, the author has a comprehensive analysis of domestic and foreign scholars on sparse matrix factorization theory and characteristic gene recognition algorithms, and found that there was a part of the lack of academic research. Therefore, based on the results of previous studies, the author selects feature extraction as a main research direction through in-depth study of data mining algorithms, then expands the sparse matrix factorization research and improves the sparse matrix factorization algorithms. The author proposes two algorithms for identifying characteristic genes: the class-information-based penalized matrix decomposition algorithm and thep-norm robust feature extraction algorithm. The class-information-based penalized matrix decomposition algorithm obtains the total scatter matrix according to a different number of samples in gene expression data, then the total scatter matrix is decomposed and rebuild a new data matrix. After then, the new data matrix is processed the penalized matrix decomposition to obtain sparse eigensamples. Finally, identify characteristic genes according to the non-zero entries in sparse eigensamples. Thep-norm robust feature extraction algorithm use Schatten p- norm as a regularization function to obtain the low-rank matrix and usepL norm as an error function to improve robustness against outliers. Therefore, the algorithm can effectively identify characteristic genes.In order to verify the performance of the two proposed algorithms, in this paper,experiments were carried out on simulation data sets and gene expression data sets to compare with existing methods. Experimental results demonstrate that the proposed algorithms are effective and feasible.One of the innovations of this paper is that the sample class information is introduced by using the total scatter matrix, and combined with penalized matrix decomposition(PMD) to putforward a new supervised feature extraction algorithm: class-information-based penalized matrix decomposition algorithm(CIPMD) to identify characteristic genes. CIPMD is successfully applied in gene expression data analysis. The second innovation is that based on the Schatten p andpL norm, the p-norm robust feature extraction algorithm(PRFE) is proposed to identify characteristic genes.
Keywords/Search Tags:Sparse matrix factorization, Characteristic gene, Gene expression data, Class information, Penalized matrix decomposition, P-norm, Robust feature extraction, Low rank
PDF Full Text Request
Related items