| With the wide application of microarray and next-generation sequencing technology,cancer-related transcriptome data is increasing significantly. How to analyze and integratethese high-throughput information, and translate them from bench to the bedside remainsthe primary goal for translational medical informatics. There have been numerouslaboratory studies that investigate cancer-related gene expression patterns and identifycancer-related molecular markers. However, the marker lists generated by differentlaboratory are not consistent. This inconsistency not only stems from the variation inexperimental platform and analytical methods, more importantly, are caused by theintra-tumor and inter-population heterogeneity of cancer.Here we present specific solutions at three different levels (gene, pathway, population)to address the poor inter-dataset reproducibility, thus improve the robustness and validity ofdiagnostic markers for cancer. We carried out case studies on two types of cancertranscriptome data: clear cell renal cell carcinoma (ccRCC) and prostate cancer (PCa) tovalidate that our method is reasonable. We also identified a sensitive and specific diagnosticmarker for ccRCC.We first evaluated the consistency between different independent datasets at gene level.We established a unified platform for cancer transcriptome meta-analysis, which was thenapplied to5independent ccRCC microRNA expression datasets for meta-analysis. Wecompared5outlier detection methods and used a novel algorithm, MOST, to account forheterogeneous activation pattern of oncogenes.Secondly, we constructed a predictive model, POMA, to integrate mRNA andmicroRNA expression profiles. POMA features a reconstructed cancer-specificmRNA-microRNA interaction network, and a scoring algorithm to evaluate the regulatoryactivity microRNAs. POMA reduces the false positive findings in DE-miRNAs and furtherimproves the consistency between the different datasets. Finally we obtained from differentdatasets a highly consistent list of11microRNAs. The list is also highly consistent with the result by RNA-Seq technologies. We performed bi-clustering and ROC curve analysis toevaluate the diagnostic performance of these microRNAs. It’s found that most of themicroRNAs have good classification performance. Of single microRNAs, miR-122-5p hasthe maximum AUC (0.957), and the sensitivity and specificity are85%and95%. Pairwisecombinations of microRNAs outperform single markers. A combination of miR-122-5p andmiR-126-3p give the highest AUC (0.978), and the sensitivity and specificity are90%and100%respectively. This combination is expected to be a potential diagnostic marker forccRCC.We also investigated the consistency of the molecular mechanisms of cancer at thepathway level. We performed enrichment analysis using gene function database (GeneOntology) and pathway databases (KEGG, MetaCore). We proved that the expressionprofiles from different laboratories are more consistent at pathway level. It’s found thatmost of the enriched pathways are related to cytoskeletal remodeling, DNA damage andcell cycle. Literature mining validated the role of the functional pathways in ccRCCcarcinogenesis. We also identified important regulatory pathways in ccRCC, includingsome novel pathways without previous annotation, such as TGF, WNT and cytoskeletalremodeling and Brca1as a transcription regulator.Finally, we studied the consistency of cancerous pathways in different populations. Weassumed that: the comparability of molecular markers depends on genetic similarity ofindividuals. By analyzing10prostate cancer gene expression datasets of different origin,we proved that people with similar genetic/environmental background tend to have moreconsistent pathways. Therefore, highly reproducible biomarkers are more likely to be foundwithin genetically homogeneous subgroups.Here we analyze and address the inconsistency of cancer biomarkers at three differentlevels, which provides a theoretical basis for in-depth understanding of the molecularmechanisms of cancer. This study also provides a novel strategy for the discovery ofreliable and specific cancer biomarkers and their clinical translation. |