Font Size: a A A

Study On Diagnosis Of Active Pulmonary Tuberculosis Based On Microarray And Dataset Sample Analysis

Posted on:2019-03-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z R BianFull Text:PDF
GTID:1364330545955115Subject:Internal Medicine
Abstract/Summary:PDF Full Text Request
Objective:With the goal of uncovering the underlying mechanisms of active pulmonary tuberculosis(APTB),we sought to extract several differential modules via analyzing previously published microarray data from the public database containing the samples of APTB and control.The pathological mechanisms underlying APTB were assessed by means of the following steps:Firstly,differential expression network(DElN)was generated based on protein-protein interactions(PPI)network which was retrieved from the STRING database;Secondly ego genes were extracted according to the degree feature in the DEN.Thirdly,module collection was conducted by ego gene expansion based on EgoNet algorithm;fourthly differential expression of modules between APTB and controls was evaluated by random permutation test;and finally,biological significance of differential modules was detected by pathways enrichment analysis based on Reactome database.The results of our paper might contribute to understanding the pathogenesis of APTB and provide potential bio-signatures for effective therapies of APTB.Methods:1.Microarray availability and pre-treatmentRaw data for APTB were recruited from the ArrayExpress server data'base(accession number:E-GEOD-56153).The microarray profile offered by Ottenhoff et al,included 18 active TB patients,18 healthy controls,15 APTB patients after 8 weeks of treatment,and 20 recovered patients after 28 weeks of treatment.In our study,to further explore the molecular mechanisms underlying APTB,we only chose 18 APTB patients and 18 healthy controls for subsequent analysis.Raw data were pre-treated by MicroArray Suite(MAS)Version 5.0 software(Affymetrix).After the probe data were mapped to the gene symbols,a total of 17,638 genes were obtained.2.Construction of DEN and calculation of weight value of each interactionBegin with,all global PPIs in human covering 787,896 interactions and 16,730 genes were retrieved from the STRING database.Next,all the genes in microarray profile identified above were mapped to the global PPI network to filter the unnecessary interactions.Eventually,50,355 interactions among 8157 genes were extracted to construct the background PPI network.Subsequently,pearson correlated coefficient(PCC)was employed to evaluate the interactions in the background PPI network,which was an index to measure the probability of two co-expressed genes.In the current work,we determined the PCC absolute value of an interaction as the predefined threshold ?,and only edges with correlations higher than ?(? ? 0.8)were selected to construct the DEN.Afterwards,a weight value was assigned to each edge in the DEN,which was calculated using one-side t-test based on the P values of differential expression in the TB and control samples.3.Finding differential modulesThe ego algorithm was designed to detect modules which were ego-connected and had maximum classification accuracy.This algorithm framework was comprised of four basic steps:(1)extraction of high z-scoring ego genes;(2)functional modules collection;(3)optimization;(4)significance filtering-3.1 Ego genes identificationPrior to module detection,we firstly identify a collection of the initial ego genes.In an attempt to discover ego genes,genes in the DEN were sorted on the basis of the degree feature.After that,a z-score for each gene in the DEN was computed according to the the formula of(?).In this formula,Nk(i)stood for the set of neighbors in the network;Ak was the degree normalized weighted adjacency matrix which was counted as(?)Then,G-scores were sorted based on descending order.In our study,the top 5%genes were yielded,and named as ego genes.3.2 Functional modules collectionAfter identifying ego genes,we took each ego gene as initial,and applied the classification accuracy index to assess the scale of module collection.This procedure was repeated till classification power was not increased.This procedure of spreading modules was known as snowball sampling.In detail,for a given ego gene n e N,it was defined as a module X in the DEN.After that,the neighbor set gene m of ego gene n was successively combined into the module X,following by the identification of a new module X5.The change of classification accuracy between the two modules was calculated:AF(X',X)= F(X)-F(X').When AF(X',X)>0,it meant the addition of gene m increased the classification power of the module X.The search step was stopped till classification power dropped.3.3 OptimizationAfter collecting the candidate modules,we optimized these candidate modules while maintaining their classification accuracy.In our study,the modules with sizes:<5 and classification power smaller than 0.9 were removed.3.4 Evaluation of statistical significanceIn this step,an empirical P values for the significance of modules were calculated according to the classification accuracy generated by random permutation test as follows:we randomly shuffled the classification accuracy of each module using random permutation test and re-ran this algorithm.The random permutation test was repeated 1000 times on the same module,and the P value of a module was recorded by comparing the classification accuracy value for the observed candidate modules with accuracy scores computed from permutated tests.Next,correction for multiple testing was utilized to control false positives.One common method for accounting for multiple testing is to control the false discovery rate(FDR).In our study,Benjamini-Hochberg method was used to correct the raw P values into FDR.Only modules with FDR not less than 0.05 was considered as differential modules.4.Module annotation with functional categoriesTo assess pathway level patterns in the observed differential modules,we used the data of Reactome and background PPIs to detect annotations enriched in the differential modules.Specifically,all human global pathways(1675 pathways)were obtained from Reactome database.We extracted the intersection between the genes enriched in each global pathway and the ones in the background PPI network.When we removed the pathways with gene number<5 gene or>100,we ended up 1137 background pathways for further analysis.Subsequently,genes of the differential modules were aligned to each seed pathway,and then we identified the pathways enriched by each differential module.Fisher's test was utilized to compute the raw enrichment P values.After that,we applied Benjamini&Hochberg method to calculate FDR to further correct the P values.In our work,pathways with FDR<0.05 were determined as the pathways enriched by a given differential module.Significantly,multiple pathways might be enriched by a module.Thus,we sorted the pathways enriched by each module in an ascending rank according to FDR scores,and the pathway with the lowest FDR was selected as the significant pathway of a given differential module.Results:1.Construction of DENBy taking the common part of the 17,638 genes in microarray profile and the global PPI network,a total of 50,355 interactions and 8157 genes were extracted to construct the background PPI network.With the goal of making the network more confidence,the interactions in the background PPI network with k>0.8 were selected to construct the DEN.The DEN covered 940 genes and 5647 interactions.2.Identification of ego genesIn our study,a total of 47 ego genes were identified in the DEN.We discovered that the scores of these genes were more than 100.Importantly,there were 6 ego genes which had the scores higher than 300,including RPL35(z-score = 370.081),RPS20(z-score = 357.377),RPL20(z-score = 333.121),RPS19(z-score = 332.626),RPL27(z-score-328.252),and RPS13(z-score = 309.069).Interestingly,we flurther found that these 47 ego genes were mainly divided into two categories:one part was associated with RPL,and the other was connected with RPS.These ego genes were related with ribosomal proteins which were indicated to be associated with the drug resistance in APTB.3.Module collectionEvery ego gene had the corresponding candidate module,and therefore a total of 47 candidate modules were obtained based on the increased classification accuracy.The mean gene number of a module was 5.When we eliminated the modules whose gene numbers:<5 and classification power smaller than 0.9,a total of 7 ego modules were identified,including Module 4,Module 7,Module 9,Module 19,Module 25,Module 38,and Module 43.Specific properties about these ego modules were listed in Table 2.Significantly,we found that these 7 ego modules had the same and the highest classification power of 15 which further suggested that these ego modules could accurately distinguish the APTB from healthy control samples.Specifically,Module 7 owned the largest gene size,covering RPL19(ego gene),RPL29,RPL32,RPL37,RPL14,RPL7A,UBC,TRIM21,and RIPK2.4.Evaluation of statistical significance for ego modulesWe applied random permutation test to further measure the significance of ego modules between APTB patients and healthy controls.For every ego module,the random permutation test was conducted for 1000 times.The results showed that the FDRs of all 7 ego modules were equal to 0,which suggested that these modules were differential.5.Module annotation with functional categoriesBased on the Reactome database,we found that genes in Module 4,Module 25,Module 38,and Module 43 were enriched in the same pathway,formation of a pool of free 40S subunits.Moreover,significant pathway for Module 7 and Module 9 was eukaryotic translation termination.The differential pathway for Module 19 was nonsense mediated decay enhanced by the exon junction complex(EJC).Conclusion:Collectively,we successfully extracted 7 differential modules which were enriched in 3 differential pathways,such as formation of a pool of free 40S subunits,eukaryotic translation termination,and nonsense mediated decay enhanced by the exon junction complex(EJC).These modules and the corresponding ego genes,as well as pathways might be underlying signatures for diagnosis and treatment for APTB,and our study shed potential insights in uncovering mechanism of APTB.However,a mountain of work is needed to validate our identified results using animal or patient tissue later.
Keywords/Search Tags:active pulmonary tuberculosis, ego genes, differential expression network, differential modules, Reactome
PDF Full Text Request
Related items