Font Size: a A A

Feature Extraction Method For Gene Expression Profiles Mining

Posted on:2016-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:T L YaoFull Text:PDF
GTID:2284330461992495Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of new molecular biology techniques and DNA microarray technology, we can quantitatively measure the expression levels of thousands genes from biological samples, and gene expression data generated by this technique can reveal implicit and previously unknown biological knowledge. In recent years, researchers used the techniques of statistics and pattern recognition to analysis the microarray gene expression data and effectively excavate the pathogenic tumor genes, so that we can make a correct diagnosis and classification prediction on the tumor types. However, tumor gene expression data are of high-dimensional characteristics with small sample sizes, before the analysis of tumor data, the traditional data processing methods generally project high-dimensional data into a low-dimensional subspace, which not only ensure the accuracy of classification and recognition, but also improve the performance and the computational efficiency of the learning method.By combining the knowledge of bioinformatics and pattern recognition, the dominant feature sub-sets are extracted from the tumor data with the characteristic of high-dimensionality and small-size, and we conduct an effectiveness analysis on the corresponding experimental results. The main contributions are summarized as follows:1. A feature selection algorithm is proposed based on the property of submodularity. First, taking the genetic correlation characteristic of the tumor gene expression data into account, the individual gene attribute is converted into a adjacency graph with structural information; secondly, a feature selection objective function with submodularity is constructed for the obtained adjacency matrix, and then a greedy algorithm is used to extract the feature sub-set; finally, the KNN and the SVM classifier are used to achieve classification and recognition of the selected feature subset of testing samples, and the experimental results illustrate the effectiveness of this method.2. To address high-dimensionality and small-size of tumor gene expression data, a feature selection method is applied via locality preserving projections (LPP). This method firstly use principal component analysis (PCA) to remove noise and reduce the dimension of the original data, and preserve 99% principal components of the processed data to characterize the original data, then we use the LPP to reduce dimension as well as preserving local information, and finally we used KNN and SVM classifier to classify tumor data effectively. In order to demonstrate the effectiveness of this method, we used three groups of the real data sets to conduct experiment and analysis the experimental results.
Keywords/Search Tags:Gene expression profiling, Submodularity, Feature extraction, Locality preserving projections
PDF Full Text Request
Related items