Font Size: a A A

Research On Statistical Methods And Application For Detecting The Group Difference Between Networks In Systems Epidemiology

Posted on:2018-01-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:J D JiFull Text:PDF
GTID:1314330512485060Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Identifying biological and environmental risk factors of human diseases has always been one of the central tasks in epidemiology.However,traditional epidemiology has been pejoratively labeled as the "black box" epidemiology,and increasingly suffered from criticism partly due to the fact that too much attention has been paid to the identification of a single risk factor rather than the network or pathway related to a disease,which led to difficulty to deeply explore disease mechanism.It is highly desirable to unlock the black box underlying observed associations and to illuminate the biological interaction mechanisms of disease-related components hiding behind the black box.With the development of recent technological advances in high-throughput omics platforms,we can enable the acquisition of omics data at unprecedented speed and amounts,and further integrate various omics data with traditional epidemiology to promote the development of systems epidemiology.The concept of "systems epidemiology" can be summarized as follows:integrate modern high-throughput omics technology with the traditional population epidemiological studies,detect the various biomarkers from Genome,Epigenome,Transcriptome,Proteome,Metabolome,and Phenome,then construct the interaction network consisting of exposure factor,multi-omics biomarkers and disease outcome based on the fruitful information in bioinformatics network database and systems biology tools.Furthermore,through comparing the group difference of network between different state(patient vs.healthy),we can infer the pathogenic network or specific pathogenic pathways of the risk factor and its effect contributing to the occurrence,development and prognosis of the disease.It can provide scientific evidence to explain the disease pathogenesis,verify the laboratory function,design the drug targets,and prevent and treat the diseases.It offers the potential to provide new insight into the underlying disease mechanisms in breadth and depth at human population level.Under the framework of systems epidemiology,the focus has been shifted from identification of single factor to exploration.of specific networks or pathways contributing to disease.Statistical comparison of group difference in biological networks or pathways can provide new insight into the underlying disease mechanism,and have extensive biomedical and clinical applications.For instance,a better understanding of the effects of molecular interconnectedness on disease progression may lead to superior identification of disease related biomolecules and pathways,which may further offer more effective targets for drug development in a cost-effective and timely manner.The core of systems epidemiology is to infer the pathogenic network or pathways of the risk factor and its effect during the occurrence,development and prognosis of the disease.Any methods omitting the network structure will make no sense.However,the traditional regression analysis,chi-square test,and t test are the currently main methods to compare the groups.They assume the nodes variables are nearly independent and ignore the essence of the interaction network.Hence,these methods failed to provide the difference of network information between two groups and identify the pathogenic network or pathways.Recently,some high qualified omics study at population level has inferred the topological structure through function tests and bioinformatics methods.Nevertheless,these methods only analyze the characteristics of network topological structure qualitatively and are difficult to infer the effect quantitatively,due to shortage of efficient statistical methods.It's worth noting that two research strategies are usually needed to infer the difference of network,with respect to the difference on hypothesis,aims and conditions.1)The molecular epidemiology based Hypothesis-Driven strategy:Researchers can deeply understand the physiology,biochemistry and pathogenesis,combining the cell/animal experiment results,and take advantage of bioinformatics database,then obtained a supposed and reasonable network or pathway.The biomarkers in the network can be detected using molecular epidemiology and the effect of the pathogenic network or pathways can be inferred by differential network analysis.2)The Data-Driven strategy:Researchers without any hypothesis for the network structure,just integrate various omics data to learn and create a system network(exposure factor-omics biomarkers-disease)to study health and disease at a human population level.Then the statistical inference methods are employed to identify the network difference and effect on disease.Then it offers the potential to provide new insight into underlying mechanisms and statistical guidelines for the design of drug targets for the therapy of complex diseases.Both strategies refer to two types of biological network,undirected network and directed network.In undirected network,we focus on detecting the group difference of nodes and their connection(association)accounting for the regulation relationships between nodes in directed ones.In systems epidemiology,biological networks not only have the general properties of complex network(self organization and similarity,attractor,small-world and scale-free,etc.).More importantly,the continuous quantitative changes of node and edge contain all the information about network differences for most pathogenic network of complex disease.That is,the nodes and edges present the quantitative gradient type rather than simple "one or null"(1 or 0)mode.Even though some genes(such as certain oncogenes)are hardly expressed,it could be essentially considered as the extreme of continuous expression.Therefore,the network difference of systems epidemiology covers the double difference of node and edge.Single level of node or edge difference is far from the whole network difference.Furthermore,for the directed network,the difference is not merely the simple summation of the node difference and edge difference.The direction information of the network topology should be also fully embedded the arrow of edge can reflect the regulation weights of upstream node on downstream node.However,the current existing network comparison algorithms(or statistics)mostly failed to cover all the information of network difference mentioned.Therefore,they are not adapted to the differential network analysis in systems epidemiology at population level.The main disadvantages can be summarized as:1)The traditional methods(chi-square test and t test,et al.)are neglect the information embedded in network topological structure.2)The network comparison algorithms and software of systems biology mainly focus on the edges other than the nodes.The main methods discriminating the network topology includes alignment,similarity,clustering,and pathway search.3)Most of bioinformatics methods standardized the nodes,which undoubtedly lose the variation of nodes.Often,the connectivity between nodes is detected,including the comparisons of centrality aim to identify the key nodes or pathways,and the comparisons of network structure based on motifs frequency.The dissimilarity measure is used to build the comparative statistics and gene expression analysis.4)Lack of effective methods taking into account the weights of upstream nodes regulate downstream nodes for comparing directed network.Under the framework of systems epidemiology,this dissertation from a dual perspective of biostatistics and systems biology,capture the nodes changes and edges changes simultaneously in the networks,follow the strategy of "Structural dissection?Statistical integration" and construct statistics integrating the nodes,edges and direction to detect the difference of network/pathway.Figure 1 presents the research framework.There are 5 types of network differences are shown in Figure 1(1).Type b,c,d,and e can be regarded as special case of type a.Suppose both networks in group A and group B have the same number of nodes(M)and edges(K).Then,the value of removed nodes and edges are O.Assuming that the sample size of group A and group B is nA,nB,respectively,where nA+nR= N.Let GA(VA,EA)denotes the network of group A,where VA=(x1A,x2A…,xMA)is the nodes set;EA =(IijA?ijA)MŚM is the edges set(the matrix form is shown in Figure 1);for undirected network;For directed network,IjijA=1,IjiA=0 if there is a directed edge between xiA and xjA(xiA?xjA)?i?j,xiA,xjA?VA;?ijA denotes the connection strength between xiA and xjA.The superscript symbol B denotes the marks for group B.In this dissertation,the strategy to construct statistics for detecting the network difference is "Structural dissection? Statistical integration".1)Structural dissection:Dissecting the topological structure of network A GA(VA,EA)and network B GB(VB,EB)into three parts,node information(VA,VB),edge information(EA,EB),and direction information(IA,IB).Then,calculate the node difference DV?(VA-VB),edge difference DE=(EA-EB),and direction difference Dl =(IA-IB)between two groups,respectively.2)Statistical integration:Integrate the difference of node DV,edge DE,and direction D,into the statistic Diff =(DV?DE?Dl)by statistical approach.Under the framework of the above strategy(Figure 1),the main contents in this dissertation include:1)Proposed the strategy of "Structural dissection ? Statistical integration".2)Integrate the continuous quantitative changes of nodes and edges,developed a statistical inference model for detecting pathogenic pathway effect(chapter 2),and proposed a score-based statistical test for group difference in undirected biological networks(chapter 3).3)Presented a statistical test for group differences of directed biological networks,which can capture the changes in both the nodes and edges,as well as simultaneously accounting for the network structure(chapter 4).4)Based on the non-parametric joint density estimate method,developed a statistical model for differential interaction network analysis and classification(chapter 5).1.Statistical inference for identification and effect estimation of disease-related pathway(Chapter 2)In traditional "black box" epidemiological studies,the case-control and cohort design are usually adopted to explore the association between one specific exposure and disease.Its strategy is to calculate the OR(e.g.OR=6.5 of smoking on lung cancer)through comparing the proportion of exposure between cases and controls,or calculate the RR(e.g.RR=5.5 of smoking on lung cancer)through comparing disease incidence between the exposed and unexposed group.The traditional statistical methods such as chi-square test,logistic regression,and Cox regression,which can only provide the strength of association between the risk factors and disease,fail to explain the pathogenic pathways of the risk factors.It can hardly predict and evaluate the intervention effects and hardly obtain repeatable results,under the condition that,the pathogenic pathway or network were ambiguous.So the "black box"epidemiology has been criticized in a few years.The epidemiological researchers try to find the ways to open the black box to clarify the pathogenic networks/pathways.Although the concept of causality network has been proposed several years ago,epidemiologists have been difficult to achieve the aspiration above,due to the lack of effective statistical inference method for identifying pathogenic pathways and estimating its effect.In this chapter,under the framework of systems epidemiology,we proposed a statistical inference model for detecting pathway effect contributing to disease following the rule of path analysis.For example,in case-control studies,in cases,if the product of the path coefficients of a specific chain(?)is not equal to zero((?)),it indicates X1D has effect on XK+1D through the chain,and the effect size is,?D=(?)In controls,the effect of the pathway(?)is(?).The statistic(?)can be used for detecting pathway effect contributing to disease.Statistical simulations were conducted to evaluate the type I error and power,and a real data set was further analyzed to validate the practicability.Results:(1)The statistics for detecting pathogenic pathway effectWe developed two typical Pathway Effect Measures(PEM)for detecting the pathway effect contributing to disease.1)Non-parametric bootstrap test.The statistic(PEM-D)defined aswhere ?kD and ?kC represent the standardized regression coefficient between kth and(k+1)th node in the pathway from cases and controls respectively.K is the pathway length.To test whether a pathway has effect on the disease of interest,we employed the percentile bootstrap confidence interval and bias-corrected bootstrap confidence interval to perform hypothesis test.2)Asymptotic normal distribution statistic(PEM-UD)defined aswhere var(?D)and var(?C,)denote the variance of ?D and ?C respectively,which can be calculated by four different methods:a)the exact estimator,;b)the unbiased estimator,;c)the multivariate delta estimator,;d)the bootstrap estimator.(2)Simulation resultsStatistical simulations were conducted to evaluate type I error of proposed statistics under null hypothesis H0:D =?D-?C?0,with different sample size(n),and to evaluate the power under H1:?D-?C?0?with different sample size(n),pathway length(K).pathway effect(? = ?D-?C),and correlation pattern.Simulation study showed that,1)the type I error rates of PEM-D and PEM-UD are close to given nominal level(a = 0.05)under null hypothesis with large sample size(Table 2.2).2)The powers of the proposed six methods are shown in Figure 2.2,Figure 2.3,and Figure 2.4.It indicated that the power of the proposed statistics increased monotonically with sample size,? and K,while decreased when the correlation pattern increased.Even though ? decreased,the power still increased with the increase of pathway length under fixed correlation pattern.Overall,the bootstrap-based tests(percentile bootstrap,bias-corrected bootstrap,variance estimated via bootstrap)have advantageous performance than the others,with bias-corrected bootstrap confidence interval method having the highest power.(3)Application resultsThe proposed PEM-D and PEM-UD were applied to analyze the acute myeloid leukemia(AML)data,consisting of Th17,Treg cells and their related cytokine transforming growth factor-beta(TGF-?)in bone marrow microenvironment from 98 AML patients and 35 controls.Significant pathway(Treg?TGF-??Th17)effect contributing to AML has been detected by five methods among the proposed six methods.Actually,not only a functional antagonism exists between Th17 and Treg cells but a dichotomy in their generation as well,and Treg,TGF-?and Th17 have been confirmed to be associated with AML.Our results further demonstrate that the pathway Treg?TGF-?-Thl7 potentially plays a role in the pathogenesis of AML.Conclusion:The proposed PEM-D and PEM-UD are valid and powerful for identifying the specific pathway contributing to disease.Bootstrap-based tests have the higher power.Innovation:We proposed two typical PEM base on the difference of the path coefficients product to detect the pathway effect within a network.It provided a novel method to study the pathogenic effect of specific pathway in systems epidemiology.2.Hypothesis test for group differences between undirected networks(Chapter 3)The essential task in systems epidemiology is to infer the pathogenic network of the risk factor and its effect contributing to the occurrence,development and prognosis of the disease by comparing the difference of network between two groups(cases VS controls,exposed group VS non exposed group,and intervention VS non intervention).However,statistical methods for detecting the group difference of networks are still in great demand.The continuous quantitative changes of node and edge contain all the information about network differences for most pathogenic network of complex disease.Although in most situations,the differences of single vertices-wise or edges-wise may be weak,their aggregated differences can be quite strong.Actually,both changes in the nodes level,and changes in the edges can lead to the whole network difference.It will undoubtedly lose statistical power to only consider the connection with the topological difference between two networks.When the direction between nodes in network is unclear,according to figure 1,the statistic for detecting group difference of undirected networks can be developed base on the theory of traditional statistics(e.g.score,likelihood ratio,and wald tests).In this chapter,we proposed a new score-based Network Difference Measure(NetDifM)as a powerful test statistic to detect group difference of undirected networks,which simultaneously capture the difference of vertices and edges(general form:Diff =(DV?DE)).Results:(1)Score-based statisticThe networks in two groups(cases and controls)denoted by GD and GC respectively,suppose both GD and GC have the same number of nodes(M)and edges(K),the null hypothesis test is H0:GD ?GC.Under H0,networks in two groups are identical not only in the average vertices levels but also in the connection strength.For individual l(l=1,2,…,N),the trait value is denoted as and the ith node is denoted as xli,.The score test vector of nodes is where measures the contribution of node xi to the disease.Analogously,the score test vector of edges is DE=(D1E,D2E,…,DKE)T,where measures the contribution of connection strength between xi and xi to the disease.Then the overall network difference measure can be defined as where estimated covariance matrix of D can be represented as and calculated as follows,1)For Naturally,for a large sample size,NetDifM has a centered ?2(M +K)distribution under the null hypothesis.When sample size is small,a permutation procedure can be employed for hypothesis testing.(2)Simulation resultsStatistical simulations were conducted to evaluate type I error of proposed statistics under null hypothesis H0:GD=GC,with different sample size(n)and network scale(M= 10,20,40;K =21,45,54).Three simulation scenarios were conducted under H0:GD ? GC to assess the statistical power of the proposed method.The simulated M-dimensional variables(nodes)were generated from a multivariate normal distribution.Scenario 1,only nodes(average levels)were different between two networks.Scenario 2,only edges(connection strength)were different between two networks.Scenario 3,both nodes and edges were different between two networks.To assess the performance of the proposed statistics given the deviation from the normal distribution,the following two scenarios were designed.(i)Conduct the exponential transformation for part of nodes randomly chosen among the M nodes;(ii)do the exponential transformation for all nodes.For each scenario,we evaluate the type I error rate and statistical power under the same three scenarios mentioned as above.Simulations results shows that the type I error rates of NetDifM,VEWDM and Yates'D are close to given nominal level(a = 0.05)under null hypothesis.Both NetDifM and VEWDM are much more powerful than Yates'D,and NetDifM keep the best performance under various scenarios(Figure3.4-Figure3.8).Specifically,the Yates'D method lose power when only nodes were different between two networks,but the power of the proposed NetDifM is still high.It indicates that NetDifM can indeed capture the perturbation of nodes and edges in the network simultaneously.(3)Application resultsTwo real data sets analyses further highlighted that NetDifM ha.d more advantage in practice.For the GWAS data of leprosy(706 cases and 514 controls),a candidate gene interaction network was identified.The gene expression data is from ovarian cancer patients.The subjects were divided into a C1 subtype,with 83 patients,and a C2-C6 subtype,with 168 patients.Two candidate subnetworks,PI3K-AKT signaling pathway and Notch signaling pathway,were considered and identified respectively,suggesting that the proposed method is capable of identifying differential gene expression and gene-gene co-expression patterns,which are certainly helpful for us to further understand the underlying disease mechanism.Conclusion:The proposed NetDifM accounting for the node changes and edge changes simultaneously was valid and powerful to detect biological network difference.It provided a feasible tool for undirected network comparison in systems epidemiology study.Innovation:We proposed a powerful score-based statistic(NetDifM.)to detect group difference in undirected networks.The covariance structure between node changes and edge changes has been embedded into the statistic.NetDifM has a chi-square distribution under the null hypothesis,and thus can avoid the high computation burden.3.Hypothesis test for group differences between directed networks(Chapter 4)The statistic NetDifM for detecting group differences between undirected networks in chapter 3 mainly focus on the node changes and edge changes,while the direction information in the network was not taken into account.However,in systems epidemiology,the directed edges(arrows)in the network can provide more valuable information for underlying pathogenic mechanism of the exposure(or intervention)and.Generally,both changes in the nodes,and changes in the edges can lead to the whole network difference.Even with the same magnitude of edges,it should also be claimed that two networks are different if reverse direction of edges exist.Therefore,the network difference is far from the simple summation of changes in the nodes and changes in the edges,and the network topology structure cannot be ignored since it can at least provide us the relative position of nodes.The key of hypothesis test for group differences between directed networks is how to integrate the whole directed network information into one score(Diff =(DV?DE UDI))which should retain the node(DV),edge(DE)and direction(DI)information.In this chapter,we develop a new statistical test for detecting group differences between directed biological networks,which is independent of the network attributes and can,in principle,capture the changes in nodes and edges as well as simultaneously accounting for the topology structure through putting more weights on the difference of nodes locating on relatively more important position in the network(The weight of upstream node greater than the downstream node).Results:(1)Statistic for detecting group differences of directed networksThe networks in two groups(cases and controls)denoted by GD and GC,and the sample size is n1 and n2,respectively,the null hypothesis test H0:GD = Gc is that no difference exists between GD and Gc.Let V(GD)and E(GD)denote the set of all nodes and directed edges in Gd.indicates the directed edge xID?XjD(i?j,i,j?V(GD),?ijD represents the effect of XiD on XjD if exist(e.g.the regulation strength of XiD on XjD Let ViD denote the number of children nodes for XiD,WiD as the relative weight for XiD,define That is to say,the relative weight for a node is defined as the proportion of the number of its children nodes among the number of children from all network nodes,where the number of children nodes for each node variable is calculated by exhaustively visiting its connected nodes with downstream direction.Let V = V(GD)?V(GC),E = E(GD)?E(GC),we propose the weighted nodes and edges statistic(WNES)as where wkD,XkD and ?ijD indicate the relative weight,the sample mean and the estimates of ?ijD in GD,wkC,XkC and ?ijC are the corresponding quantities in GC.Note that network structure(including the direction of edges)in GD may be different from that in GC,K and M is the number of nodes in V and edges in E,if node Xk(edge)exists in GD but not in GC,we treat XkC and the variance of and the variance of equal to zero,and vice versa.The form of weight in WNES can be replaced by log,where smaller a and b represent more contribution of topological differences.The permutation procedure was employed for hypothesis testing.(2)Simulation resultsStatistical simulations were conducted to evaluate type I error of proposed statistics under null hypothesis H0:GD=GC,with different sample size(n),network scale(M=10,35;K=15 79),and network structure.Five simulation scenarios were conducted under H0:GD ? Gc to assess the statistical power of the proposed statistic with three different weighted schemes(1 +(wkD + wkC)/2,log2(2 +(wkD + wkC)/2),and no direction weighted)in five simulation scenarios:(1)only node change;(II)only edge changes;(III)changes of edge as in(II)and changes of upstream node;(IV)changes of edge as in(II)and changes of downstream node;(V)only edge direction change.Besides,to evaluate the performance of statistics,simulations were conducted under different network structure,scale of network,weighted method.We compared the proposed statistic with the statistic only considering nodes change NS and edges change ES.Simulation results shown that the type I error rates of WNES are close to given nominal level(a = 0.05)under null hypothesis(Table 4.1).It indicates the proposed statistic is stable.The powers of WNES are shown in Figure 4.3-Figxure 4.6.Figure 4.3A shows the power when only the nodes change.As expected,ES has no power because it can only capture the edge change.WNES has a little higher power than that of NS.When only the edge change,the power for NS vanished,ES expectedly presents the highest power,and the power for WNES smaller than that of ES.When both the edges and nodes change,WNES shows the highest power.WNES still has the relatively high power when only edge direction changes.It indicates that WNES is powerful to capture the changes in the nodes and edges,and the edge direction.The two weighted schemes(a +(wkD +wkC)/2 and logb(b+(wkD+wkC)/2))are reasonable and effective to integrate the information of network topological structure.(3)Application resultsWNES employed to analyze the leprosy GWAS data,acute myeloid leukemia data,and gene expression data of lung cancer.WNES identified the network difference of leprosy related genes.An immunity related network contains Foxp3,IL-10,Th17,and TGF-? has been detected significant difference between AML patients and controls.WNES was also found that the network perturbation of 35 genes of Wnt canonical signaling pathway associated with lung cancer(Table 4.2).Conclusion:The proposed new statistic WNES can capture the changes in nodes,edges,and direction simultaneously.It is a valid and powerful method for detecting group differences between directed networks in systems epidemiology.Innovation:We provide a flexible weighted approach to integrate the information of node,edge,and direction(upstream node regulate the downstream node)through putting more weights on the difference of nodes locating on relatively more important position in the network.The proposed statistic WNES provides a novel method for detecting group differences between directed networks in systems epidemiology4.Screening strategy for disease-related interaction network and assessment for its predictive performance(Chapter 5)The identification of pairwise biomarker interactions may help us to illuminate the underlying genetic mechanisms of complex diseases(e.g.cancer),to predict drug off-target effects,to develop multi-target anti-cancer therapy,and to discover clinical biomarkers for disease classification.However,most of the existing methods are based on marginal or partial correlation.It can only capture the linear relationship among biomarkers,which could be restrictive in real applications.It is often the case that nonlinear relationships exist between biomarkers.Another critical but inadequately addressed issue is how to adjust the confounding factors in the differential network analysis.Furthermore,how to use the identified network biomarkers to achieve classification still poses great challenge in discriminant analysis especially in high-dimensional settings.To address the challenges in differential network analysis and classification mentioned above using high dimensional data,we propose a Joint density based non-parametric Differential Interaction Network Analysis and Classification(JDINAC)method to identify differential patterns of network activation between condition-specific groups.Take case-control design for example,the binary response variable is Y,Y=1 denotes case group(class 1)and Y = 0 denotes control group(class 0),fij and gij denote the class conditional joint density of biomarker xi and xj respectively for case group and control group,i.e.,((xi,xj)|Y=1)?fij and((xi,xj)|Y=0)?gij.The conditional joint densities fij(xi,xj)can indicate the strength of association between xi and xj.in case group.In(fij(xi,xj/gij(xi,xj))can be used to indicate the difference of association strength of pairs(xi,xj)between two groups.It is a nonparametric approach and can identify the nonlinear relationship among variables.Besides,it does not require any conditions on the distribution of the data,which makes it more robust.Results:(1)Statistical modelAssume that we have observed gene-level activities for p genes measured over individuals.For individual l(l = 1,2,…,n),the binary response variable is denoted as and the expression level of ith gene is denoted asxuh.The JDINAC approach based on the logistic regression model can be constructed as,where Zs(s=1,…,S)denote the covariates(e.g.age and gender),fij and gij denote the class conditional joint density of xi and xj respectively for class 1 and class 0,i.e.,((xi,xj)|Y=1)?fij and((xi,xj)| Y = 0)?gij.The conditional joint densities fij((xi,xj)can indicate the strength of association between xi and xj in class 1.Since the number of pairs(xi,xj)can be larger than the sample size,the L1 penalty was adopted in this high-dimensional setting.Parameters ?ij?0 indicate differential dependency patterns between condition-specific groups.L1 regularized estimate for ?:the operator vec(X)stacks the columns of the matrix X to a vector.JDINAC can be implemented as follows,Step 1.Given N observations D={(Y1,X1),l= 1,…n}.Randomly split the data into two parts:D =(D1,D2).Step 2.On part D1,estimate the joint kernel density functions and gij(xi,xj),i,j = 1,…,p,j>i · Step 3.On part D2,fit an L1-penalized logistic regression logit,using cross validation to get the best penalty parameter.Step 4.Repeat Step 1-Step 3 for T times,obtain Pij and?ij,t,t= 1,2,…,T,for individual l using the average prediction as the final prediction,and assign the lth individual to class 1 if Pl>0.5,and class 0 otherwise.Step 5.Calculate the differential dependency weight of each pair(xi,xj)between two groups,;where I(·)is the indicator function.(2)Simulation resultsFour simulation scenarios were designed in this section.In scenarios 1 and 2,the difference of association strength between pairs of genes in a network is caused by the different correlation.In scenario 3,the differential pairs have the same correlation structure between condition-specific groups but different joint density.In scenario 4,the differential strength of association between pairs of genes in a network is caused by the nonlinear dependence.True discovery rate(TDR),true positive rate(TPR),and true negative rate(TNR)are used to evaluate the performance of other 3 methods(DiffCorr,DEDN and cPLR)in terms of differential network estimation.The ROC curve and classification error are used to assess the classification performance of JDINAC,RF,NB,and Lasso based methods.Simulation results showed that 1)JDINAC has high reliability(Figure 4.3)and almost has the highest TPR,TNR and TDR,especially in scenarios 3 and 4 for the differential network analysis(Table 4.1).The TDR of IDINAC is 93.7%,95.6%,88.3%,99.9%respectively,under four simulation scenarios.The TDR of other three methods are listed as follow,DiffCorr(81.3%,85%,7.5%,3.8%),DEDN(33.5%,16.5%,2.1%,5%),cPLR(19.8%,25.6%,53.6%,0.7%).It indicates that JDINAC can indeed capture the perturbation of nonlinear dependence in the network.2)The ROC curves and classification errors show that JDINAC performs the best among the 5 methods in classification(Figure 5.4 and Table 5.2).JDINAC is much more accurate than other methods.(3)Application resultIn the real data application,we apply JDINAC to a Breast Invasive Carcinoma gene expression dataset from TCGA,which includes 114 patients who have both tumor and matched normal samples.We focus on 373 genes listed in the cancer pathway of KEGG as our final candidate genes.To evaluate the performances of classification,we randomly choose 50 of 114 individuals in each group as our test data set.We found there are experimental supports for the top ranked pairs in the differential network est...
Keywords/Search Tags:Systems epidemiology, Network comparison, Statistical inference, High-dimensional data
PDF Full Text Request
Related items