Font Size: a A A

Microarray Data Perturbation Studies On The Effects Of False Discovery Rate Methods On Screening Differentially Expressed Genes

Posted on:2015-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LiuFull Text:PDF
GTID:2354330518482667Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Cancer,as known as malignant tumor,has a strong impact on human health nowadays.And it is also one of the major diseases which threaten human survival.However,the cancers have strong features such as transfer,infiltration,and the infinite growth.It is impossible to cure cancers for traditional treatment methods.It is confirmed that the occurrence and development of tumor is a complicate process in which many genes take part and go through a lot of stage developments Humans continued to explore for cancer as well as the improvement of modern science and technology.Especially after entering the post genome era,Gene chip(gene chip)was brought new hope for the treatment of cancer.In this paper,we have finished two tasks by analyzing the Gene Expression data.Nowadays,False Discovery Rate(FDR)is widely used in multiple testing in choosing of differentially expressed genes with gene chip data.And it usually uses Adaptive Linear Step-Up(ALSU)to calculate and control program.In the application of FDR,how to estimate the number of the same expressed genes(m0)is more important to choose differentially expressed genes.In this paper,we put forward several new estimation methods and name the new methods Small Scale Iteration Method,Small Scope Fitting Method and Large Scope Fitting Method aiming at the shortcomings of the traditional n0 estimation methods,such as the method of ABH,method of S-? and method of TST.Using computer simulate gene chip data,and using the mean,standard deviation,range,quartile,root mean square error and coefficient of variation as evaluation indicators to judge the results of traditional methods and new methods.The results show that the new methods are more accurate and stable than the traditional methods.We use the method of FDR to analyze cancer genes expression data to choose ontogenesis.In this process,we find that use the same method to deal with two groups of gene expression data.The data is about the same cancer and come from different laboratories.The results show that repetitive rate of differentially expressed genes is low.We have carried on the research to find out whether the method of FDR is unstable or the data with errors lead to this phenomenon.In this paper,using prostate cancer data(serial number is GSE6919)in the database of GEO as the research object.We use computer to produce simulated data with different errors.Using the method of FDR to choose differentially expressed genes and calculate the repetitive rate.We discussing the influences of errors on the repetitive rate,and getting the quantitative relationship between the errors and repetitive rate.There are three results.Firstly,when the bound on error is less than 30%,it has a linear relationship with the repetitive rate.Secondly,we take the significance level is 0.05.When bound of error increased by one percent,the differentially expressed genes overall repetitive rate reduce 2.5%,the PATHWAY overall repetitive rate reduce 0.76%and the PATHWAY gene repetitive rate reduce 0.23%.Third,if the bound on error is below 10%.The differentially expressed genes overall repetitive rate will not be less than 75%.Therefore,our research results indicate that the smaller error will not change the repetitive rate too much.The method of FDR is stable,and it is a better statistical method to choose differentially expressed genes.So,gene expression data error is the ma:in cause of low repetitive rate.Gene chip data error is derived from biology error and experiment error.With the development of science and technology,the improvement of instruments and the improvement of the experimental methods,experimental error will be better contorted.And the biological differences of gene expression will continue to affect gene expression level and become the primary factor.If the differentially expressed genes overall repetitive rate is still low,reducing the biology error will become the top task.
Keywords/Search Tags:Post genome era, Gene chip, False Discovery Rate, Small Scale Iterative Method, Small Scale Fitting Method, Large Scale Fitting Method, Root mean square error, Coefficient of Variation
PDF Full Text Request
Related items