Font Size: a A A

Identification Of Population-Level Differentially Expressed Genes In One-Phenotype Data

Posted on:2022-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:J J XieFull Text:PDF
GTID:2480306554977289Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Background and motivation: For some specific tissues,such as the heart and brain,normal controls are difficult to obtain.In this study,the dataset with only a particular type of disease samples but no normal control was defined as one-phenotype data.Thus,onephenotype data cannot be analyzed by using the common differentially expressed genes(DEGs)identification methods,such as significance analysis of microarrays(SAM),limma(Linear Models for Microarray Analysis)and edge R.The RankComp algorithm,which was mainly developed to identify individual-level DEGs,can be applied to identify population-level DEGs for the one-phenotype data but cannot identify the dysregulation directions of DEGs.Here,we optimized the RankComp algorithm to improve the detection performance of population-level DEGs for the one-phenotype data.Materials and methods: Firstly,we downloaded the datasets of heart,brain,breast and colorectal tissues from the public databases.For a specific tissue,normal controls from different datasets of microarray and RNA-seq were integrated respectively.We based on the relative expression orders(REOs)of gene expression levels in the samples,and the overlapped gene pairs which have the same REO patterns in the two platforms are defined as the across-platform normal background.The RankComp algorithm was used to identify individual-level DEGs in the one-phenotype data by using the normal background.The RankComp algorithm directly estimated the probability of a DEG in one-phenotype data by using the mean of the DEGs frequency of all samples.In this study,we optimized the RankComp algorithm,termed PhenoComp.PhenoComp used the mean and median values of the frequencies of upregulated(downregulated)genes among all of the DEGs to estimate the probabilities of upregulated(downregulated)genes among all of the DEGs and combined with the binomial test to identify upregulated(downregulated)genes at population-level.Next,we compared the detection performance of PhenoComp with that of RankComp in both simulated and real one phenotype data.The DEGs detected by SAM,edge R and limma methods using case-control samples as the ‘gold standard',which evaluated the performance of PhenoComp in real one-phenotype data.Finally,we evaluated the performance of PhenoComp for weakly differential expressed signal data.Results: The frequencies of DEGs among all of the genes,and upregulated and downregulated genes among all of the DEGs varied widely across different datasets,which suggests that the probabilities of DEGs among all of the genes,and upregulated and downregulated genes among all of the DEGs should be separately estimated in each dataset.In simulated data,we found that RankComp could not identify population-level DEGs when there were as low as five disease samples,while PhenoComp can identify the population-level DEGs.Moreover,the DEGs identified by RankComp in each simulated experiment were all included in the DEGs identified by PhenoComp in each dataset when more disease samples were included in the dataset.Next,the performance of PhenoComp was evaluated in real one-phenotype data.The DEGs detected by SAM,edge R and limma methods using case-control samples as the ‘gold standard',the results showed that the DEGs detected by PhenoComp using only one-phenotype data were comparable to those ‘gold standard',independent of the measurement platform.To compare the performance of PhenoComp and RankComp in real one phenotype data,we define the dysregulation direction of population-level DEGs identified by the RankComp algorithm.The results showed that the DEGs identified by RankComp with certain dysregulation directions were all included in those identified by PhenoComp.Finally,we evaluated the performance of PhenoComp for weakly differential expression signal data.SAM and limma cannot identify population-level DEGs for weakly differential expression signal data,while the PhenoComp algorithm can identify a certain number of population-level DEGs,and these DEGs are enriched into the pathways related to the analyzed diseases.Conclusion: In this study,we optimized the RankComp algorithm to identify populationlevel DEGs of one phenotype data.The optimized algorithm was named PhenoComp.PhenoComp algorithm has better detection performance than the RankComp algorithm,especially for data with small sample size.In summary,PhenoComp is an efficient algorithm for analyzing specific types of data,including one phenotype data and weak differential expression signal data,independent of the measurement platform used.
Keywords/Search Tags:One phenotype data, Population-level, Differentially expressed genes, Weak differential expression signal data
PDF Full Text Request
Related items