Font Size: a A A

Comparisons Of Different Methods For Gene Data Analysis In Microarray

Posted on:2009-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:W J DanFull Text:PDF
GTID:2120360245956452Subject:Forest genomics and bioinformatics
Abstract/Summary:PDF Full Text Request
This thesis was aimed at comparing the different methods for gene expression data analysis in microarray in three areas: the comparison of eight different statistical methods for identifying differential gene expression, the comparison of five different methods for corrected p-value and the effects of nine different similarity distance measures on cluster analysis.The results of simulation data analysis showed that the eight methods of identifying differential gene expression were most preferable with the microarray data of uniform distribution. They were also more preferable with the normal distribution, but were not preferable with theχ2 distribution and exponential distribution. Of these eight methods, SAM and Wilcoxon rank sum test performed well in most cases. Results for Real cDNA microarray data of Populus showed that there was much similarity for SAM, Samroc and regression modeling approach.The Bonferroni method, Holm method and Benjamini & Hochberg False Discovery Rate method were too rigorous to correct p-value for identifying genes with statistically significant changes in expression in microarray. The permutation and bootstrap method could decrease the false discovery rate of the identified genes result.The effects of eight distance measures on hierarchical clustering and PAM clustering were analyzed for microarray data. For hierarchical clustering, we observed that cubic distance and standardized Euclidean distance performed more sensitive than Manhattan distance and Euclidean distance, so Cubic distance and standardized Euclidean distance gave more accurate clustering results. Cosine correlation coefficient was the best one in non-metric measures. For PAM clustering, the results were almost similar in distance metrics measures. Non-metric measures performed better than distance metrics measures except uncentered Pearson correlation coefficient.Based on these conclusions, we wish to advise as follows. It's better to use SAM, Samroc, Wilcoxon rank sum test and regression modeling approach for detecting differential expression. It's better to use the permutation and bootstrap method for p-value correction. It's better to use the standardized Euclidean distance, the Cosine correlation coefficient and the Pearson correlation coefficient for the distance measures on cluster analysis.
Keywords/Search Tags:microarray, Populus, differential expression, corrected p-value, distance measures, hierarchical clustering, PAM clustering
PDF Full Text Request
Related items