Differential gene expressions would trigger abnormal changes in cells and tissues andconsequently result in disease development. Microarray technology could measure geneexpression level and enable statistical analysis aided by computer technology. DGEdetection plays an important role in the research aiming to reveal the disease developingmechanismanddrugdesign.This dissertation focused on differential gene expression in cancer group subset, gavea comprehensive research in the related detecting method and proposed new methods. Themainworktopicsinclude:1. Discussed the research progress of DGE detection, summarized detecting methodsfor cancer group subset, and compared seven percentile-based methods, COPA, OS, ORT,PPST, F, OF and ORF method through simulation study and real dataset experiment.Results indicated that the percentile-based methods are effective when the size of DGEcancersubset is large,but not verysensitivetosmall subset ofcancergroupthat containingDGEgenes.2. Since a few of the currently available change-point methods deal explicitly withestimation of the number and location of change points, and moreover these methods maybesomewhat vulnerable todeviations ofmodel assumptions usuallyemployed,weproposea non-parametric statistical method for DGE detection, named as NPCPS (Non-ParametricChange Point Statistics). NPCPS is based on modified Kolmogorov statistic to detect thesingle-change point in a data sequence. This method compares the data distribution ofnormal and cancer group to detect the existence of possible change-point in the cancergroup, and to estimate the position of change-points. Besides, as a non-parametricinferential method, NPCPS does not make assumptions about the probability distributionsof the variables being assessed, and accordingly, it is not necessary to normalize themicroarray data before calculating the test statistic like other parametric methods usuallydo.Ascomparison,wetestedseveralpercentile-basedmethodsandLRS.BRIDGEwasnotincluded as it is originally designed for two-sample problem and application to largersample size is computationally heavy. NPCPS works comfortably with large-scale dataset,and both simulation and experiment results show that NPCPS is effective for DGE detection.3. To promote the detecting sensitivity for DGE in small cancer group subset, basedon NPCPS, a novel method, Weighted Change Point Statistics (WCPS) method, wasproposed. In WCPS, a weight factor was added to the NPCPS statistic to maintain thestatistic value of most of the samples while escalate the last few samples. By adding suchweight function, the WCPS statistic became much more sensitive to the right bound.Simulation study and experiment results indicated that, WCPS had less false errorcomparing with NPCPS; when cancer subgroup with DGE is small, WCPS could detectmoreaccuratelythe existenceofDGE; andthe estimatedpositionofchange-point was alsogenerallyclosertothetruepositionofchange-point.4. The proposed methods and other percentile-based methods were applied to realmicroarraydataset from breast cancer and colon cancer tissues. Breast cancer is one of themajor malicious diseases that impact women health in modern society. The number ofbreast cancer patients in China has already reached one million. Since 5%-10% of breastcancerhasfamilyhistory,itisreasonabletoassumethatbreastcancerispotentiallygenetic.Colon cancer is also a tumor disease of high incidence rate and is highly related to dietarystructure, with higher rate in European, North America and Australia, while lower rate isfoundinAsia,Africa,andSouthAmerica.However,itis worthnotingthat afamilyhistoryof colon cancer could result in four times'higher incident rate. Considering the geneticfactorsofbreastandcoloncancer,analyzingmicroarraydatabydetectingDGEhasclinicalandresearchsignificance.Ninemethods includingWCPS,NPCPS,LRS,COPA,OS,ORT,PPST, T-statistic and MOST were applied to two types of dataset. The detecting resultswere compared and analyzed, the data characteristic of microarray gene expression profilewas studied, and the basic clustering analysis of the cancer samples was performed basedon DGE by applying WCPS method. The experimental result indicated that the proposedNPCPS and WCPS method outperform the other seven methods, while WCPS was betterthan NPCPS in terms of more detected DGE genes. The clustering analysis by WCPSindicated that there are very small correlations between most of the genes in the coloncancermicroarraydata,whileverylargecorrelationswerefoundbetween smallamountsofgenes.ConsideringthesimilarDGEpatternbetweenthesegenes,itisreasonabletoassumethat these genes or the proteins they express have biological interactions in cancerdevelopment.In summary, based on the analysis of percentile-based DGE detecting methods andaiming to enhance detecting accuracy for smaller cancer subset, two novel DGE detecting methods based on change-point were proposed. Through simulation study and experimenton real public dataset, change-point based methods could effectivelydetect DGE in cancersamplesubset, and are evenmore competitive whenthe subset is small comparingwiththepercentile-basedmethods;inadditiontoDGEdetection,change-pointbasedmethodscouldalso enableclusteringanalysis ofthecancersample.Therefore,application ofchange-pointtheoryto DGE detecting has theoretical and practical significance from both statistical andbiological perspectives, and could play an important role in various areas, such as cancerdiagnosis and research, tumor classification, personalized therapy, as well as cancer drugdevelopment. |