Font Size: a A A

Study On Statistical Methods For Differential Gene Expression Detection Based On Sample Subsets

Posted on:2011-12-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H JiFull Text:PDF
GTID:1118360332457351Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Normally, cell genes are expressed according to specific time and spatial se-quences. However, influenced by environmental condition or other factors, cell genes might have gene mutation which would cause abnormal change of phenotype, called differential gene expression (DGE). Microarray is a cutting edge technology that mainly serves to analyze the biological significance of gene expression profile. By us-ing microarray technology, it is possible to detect DGE of thousands of genes simulta-neously with rapid speed and good accuracy.DGE detecting methods study the gene expression profile on single-gene level through duplicate experiments and recognizes potential over-expressed cancer sample by statistical hypothesis test. The detected genes can help with identifying can-cer-related genes and gene clusters. DGE detection can be applied to many areas, such as studying drug molecular mechanism, developing new drug target, screening high throughput drug, and evaluating drug activity as well as toxicity, etc. It is of great sig-nificance to revealing cancer disease mechanism and developing anticancerogen.The core algorithms of DGE detection in microarray data are normally based on statistics. Potential DGE genes are screened out through hypothesis test. Traditional DGE detection is based on the hypothesis that the entire cancer group is over-expressed compared with the normal group. However, in 2005, Tomlins et al. pointed out on Science that DGE might only exist in cancer subgroup rather than in the entire cancer group. In recent years, great effort was devoted to solve DGE detection in over-expressed cancer subgroup, and various statistical methods were proposed based on the assumption by Tomlins et al.This dissertation is focused on the DGE detection based on the assumption of over-expressed cancer subgroup. The main content of this thesis includes:1) Comparison study was carried out on six popular DGE detecting methods, in-cluding T-statistic, PPST, COPA, OS, ORT, and MOST. T-statistic is a traditional de-tecting method, which assumes that the entire cancer group is over-expressed com- pared with the normal group, and calculates the mean and pooled standard deviation of both normal and cancer group. PPST compares the expression levels of genes between the case group (A) and the control group (B), and targets DGEs with the difference exceeding a certain percentile. COPA is based on the median and median absolute de-viation. Based on COPA, OS introduces the quartile distance to measure data dispersity. ORT is similar to OS as they both use the quartile as threshold. The difference lies in that when calculating quartile, the OS method uses both the cancer group and the healthy group, and ORT only uses the healthy group. MOST method considers all the possible critical value of gene expression, and defines the detecting threshold using the maximum value of the statistics. The six methods aforementioned were tested and analyzed through simulation study and real data experiment.2) Two statistics, tri-mean and tri-MAD, were proposed for DGE detection. When DGE exists in microarray data, the mean value of gene expression value is prone to the DGE value, while median is more robust and has better anti-interference ability. Tri-mean synthesizes the information of upper quartile, lower quartile and median, therefore can offer more comprehensive descriptions of sample information with better stability, without neglecting data points distant from group median.3) New DGE detecting method TriORT was proposed for over-expressed cancer subgroup. TriORT was based on ORT and defined the DGE by tri-mean and tri-MAD. Besides, median and other few statistics were also used to fully represent the data cha-racteristic in microarray data in a more robust manner. The threshold was decided ac-cording to quartile and intuitive rule as additional expression value. Experimental re-sults indicated that the proposed method was more effective for the over-expressed cancer subgroup and had better sensitiveness as well as specificity.4) Novel DGE detecting method TriMOST was proposed for over-expressed can-cer subgroup. TriMOST was based on MOST and introduced tri-mean and tri-MAD to the definition of DGE value in the over-expressed cancer subgroup compared with normal group. When the active number of DGE genes was unknown, mean and MAD values were used to give a more through search of all possible thresholds that screen out DGE genes. Experimental results also indicated that the proposed method had very promising performance in both simulation study and real data experiment.5) The proposed methods were compared with six existing methods. We first car-ried out a simulation study to test all the discussed methods on simulated data. Then all the discussed methods were applied to the real database provided by West. The ex-periment results of real data were checked on NCBI to verify the detected cancer genes. The total eight methods were compared and analyzed based on their experimental re-sults. Through DGE detection, we can obtain further knowledge of cancer relevant gene groups, and this can provide new approach to the healing of breast cancer disease.In summary, we analyzed six methods for DGE detection and proposed two novel DGE detecting methods based on the assumption of over-expressed cancer subgroup. Through simulation study and real data experiments, the proposed methods were demonstrated to be of good sensitiveness and specificity, as well as better detecting performance compared with the other six methods. Based on the experimental results, it can be concluded that for over-expressed cancer subgroup in microarray data, de-tecting methods based on tri-mean and tri-MAD can reflect the microarray data with more comprehensiveness and stability which would bring better detecting perform-ance.
Keywords/Search Tags:Microarray data, differential gene expression detection, statistic methodology, tri-mean, breast tumor
PDF Full Text Request
Related items