Font Size: a A A

The Construction Of Breast Cancer Integrative Data Analysis Platform And The Identification Of Molecular Markers

Posted on:2018-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:J Q WuFull Text:PDF
GTID:2348330518965283Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Breast cancer is a malignant tumor that develops from breast tissue and is a highly heterogeneous tumor.The incidence of global breast cancer has continued to rise since the late 1970 s.It has become the most frequently diagnosed cancer among women,with an estimated 1,676,600 cases and 521,900 deaths in 2012 worldwide.Breast cancer alone accounts for 25% of all cancer cases and 15% of all cancer deaths among females.It is a complex disease that is affected by a variety of environmental and genetic factors and has become a major killer that threatens women's health and life.The biological factors associated with the risk of breast cancer include hormones,genetic factors,age differences,and viral effects.It was generally recognized that the hormone factors account for the most important role in tumorigenesis and progression of breast cancer based on epidemiological and intervention researches,and the most obvious evidence is that the incidence of breast cancer is approximately 125 times higher among female among the male.Furthermore,the genetic factors are also important in the development of the cancer.The most famous genes closely related to breast cancer,BRCA1 and BRCA2,were found by linkage analysis of symptomatic patients.There are also some highly visible breast cancer related genes,including p53,PTEN and CHEK2,which play important roles in the pathogenesis of specific subgroups of breast cancer patients.Breast cancer is a highly heterogeneous complex disease.Along with the deepening research of tumor,researchers realized that breast cancer is a genomic disease,and the relevant researches on tumor biological behavior from molecular level have provided an important basis for understanding the tumor.The appearance of breast cancer molecular subtyping,which is based on the expression differences of tumor molecule,has facilitated researchers to acquire a brand-new cognition about breast cancer heterogeneity.And the derived various targeted diagnosis and treatment programs provide an important guarantee for the effectiveness of individual treatment and the accuracy of the prognosis judgment of breast cancer patients.In recent years,molecular biology technologies and some other new technologies have made huge improvement and continuous development,researchers made a lot of efforts and attempts to explore the inherent nature of breast cancer using a variety of methods from molecular level or genomic level.The development of gene expression profiling technique,for example,provides the possibility for such exploration,thereby obtaining the molecular characteristics of breast cancer patients and categorizing them into specified subtypes.This can achieve more accurate patient care and prognosis assessment.The application of new molecular biology techniques in breast cancer,such as the whole genome expression analysis based on tissue samples,provides a new perspective for exploring the nature of disease,including pathogenesis,taxonomy and treatment strategies.Although the understanding of breast cancer has been greatly improved,many issues on tumorigenesis and progression are still elusive due to the high heterogeneity of breast cancer.It is still critical to identify genes that play important roles in the progression of tumors,especially for tumors with poor prognosis such as basal-like breast cancer and tumors in very young women.In this study,we intend to build a gene-centered multi-omics integrative platform which contains various genetic analysis functions.It will provide a feasible and convenient analysis platform or tool for investigation of gene function and identification of potential diagnostic or prognosis biomarkers.To present a comprehensive and multi-functional breast cancer multi-omics integrative platform,we firstly carried out a comprehensive survey of breast cancer-related data,and then we assessed and collected five types of data including gene expression profiles data,copy number variation data,microRNA targeted interactions data,KEGG pathways,and mammary tissue-specific gene functional networks.We collected gene expression profiles,copy number variation,and clinical information from NCBI Gene Expression Omnibus,The Cancer Genome Atlas,and EMBL European Bioinformatics Institute.We also integrated microRNA-target interactions from miRTar Base,human pathways from KEGG database,and tissue-specific gene functional networks of mammary gland and mammary epithelium from GIANT.All of the samples have been collected and undergone strict quality controls and uniform data processing,then some unqualified samples were removed.We finally maintains gene expression profiles from 9,005 tumor tissue samples and 376 normal tissue samples and copy number variation information from 3,035 tumor samples,together with other multi-omics data.To provide a favorable data analysis platform and tool which can facilitate the identification of genes with potential roles in breast cancer,we built a web server and constructed the Breast Cancer Integrative Platform(BCIP,http://omics.bmi.ac.cn/bcancer/).Compared with other breast cancer databases and analysis tools,BCIP has 2 unique characteristics:(i)BCIP incorporates multiple analysis types,including transcriptome analysis,copy number variation analysis,MicroRNA-target interaction analysis,KEGG pathways analysis,and mammary tissue-specific gene functional network analysis.All of these analysis tools help to sketch an overview of a gene in breast cancer.(ii)BCIP permits users to perform analysis in specific breast cancer subgroups that are customized with single or combined clinical features of interest.We provide a total of 15 histopathological features and various clinical information,such as therapy response and prognosis.BCIP provides graphical presentations and statistical analysis results of each type of the analysis.Furthermore,we have also analyzed gene expression profiles of biopsy specimens from breast cancer patients who were treated with neoadjuvant chemotherapy after biopsies,and to identify the genes which are closely associated with the efficacy of neoadjuvant chemotherapy with T/FAC(Taxotere,5-fluorouracil,doxorubicin and cyclophosphamide)or T/FEC(Taxotere,5-fluorouracil,epirubicin and cyclophosphamide)regimen.Response to neoadjuvant chemotherapy was categorized as a pathologic complete response(pCR)or residual invasive cancer(RD).The differentially expressed genes and therapeutic efficacy were analyzed and explored.After differential analysis,genes that significantly expressed higher and lower(adjusted P-value < 0.05)in pCR group than RD group were identified in each of the 4 datasets,respectively.And there are 34 and 42 genes which are simultaneously higher expressed or lower expressed in pCR group than RD group in the 4 datasets.The unsupervised clustering,based on the 76 intersection genes,shows that the pCR specimens tend to form one cluster and the RD tend to form the other(Kappa test,P-value < 0.05).The 76 differentially expressed genes are associated with efficacy of neoadjuvant chemotherapy and are likely to be novel predictive biomarkers for neoadjuvant chemotherapy efficacy.
Keywords/Search Tags:breast cancer, multi-omics integrative analysis platform, biomarkers, neoadjuvant chemotherapy efficacy
PDF Full Text Request
Related items