Breast cancer is the most prevalent malignant tumor,which seriously threatening the health of women around the world.In 2012,it was approximately 1.67 million new breast cancer cases worldwide,accounting for 25.1%of all new cancer cases,while it was approximately 0.52 million deaths worldwide,accounting for 14.7%of all deaths caused by malignant tumor.In China,the incidence rate had been increasing with an average rate of 3-5%in the past decades.It is generally considered that the etiology of breast cancer is an interaction effect between genetic and environmental factors.However,individuals with different genetic backgrounds have different risk of breast cancer under the same environmental exposure,suggesting that genetic differences may affect individual susceptibility to breast cancer.Therefore,screening for high risk populations with genetic susceptibility,and performing targeted preventive measures,is one of the strategies to reduce the incidence and mortality of breast cancer.Recent studies indicated that the dysregulated expression of long noncoding RNAs(lncRNAs)was implicated in tumorigenesis,suggesting that specific expression of lncRNA may be used as a biomarker of tumor diagnosis and potential drug targets.It’s noteworthy that genetic variants located in lncRNA may disturb their expression or affect its target sequence/protein binding and then affect the tumorigenesis and progression.Therefore,in the present study,we performed RNA sequencing(RNA-seq)among several paired breast cancer tumor and adjacent normal tissues to systematically study the association between the regulatory variants in dysregulated expression IncRNAs and the susceptibility of breast cancer.The present study contained two parts.In the discovery stage,we performed RNA-seq using the illumina Hiseq 1500 among five paired breast cancer and adjacent non-cancerous tissues.The raw RNA-seq data were saved in fastq format,and low-quality reads were removed prior to analyzing the data.The remaining qualified reads were mapped to human reference genome(GENCODE Version 19)and non-coding RNA of transcripts with a length more than 200nt were seleted using the TopHat version 2.0.9.Cufflinks(v2.2.1)was used to assemble the aligned reads into transcripts and estimate expression abundances.Differential expression analysis of IncRNAs was performed to compare paired tumor and adjacent samples using the Bioconductor package DESeq2 in an R statistical programming environment.In the association study stage,a total of 1,486 breast cancer patients and 1,519 cancer-free controls were included.We performed bioinformatic prediction using the RegulomeDB to predict regulatory variants in differentially expressed IncRNAs.Finally,common genetic variants(minor allele frequency,MAF>0.05),which with the RegulomeDB score 1-3 and the lowest P value when multiple variants show a strong linkage disequilibrium(LD)(r2>0.8)in Chinese population were retained.Subsequent genotyping was performed using the iPLEX Sequenom MassARRAY platform.The associations of genotypes with breast cancer risk were estimated by computing odds ratios(ORs)and 95%confidence intervals(CIs)from logistic regression analyses.We finally indetified 11 differential expression lncRNAs meet the following criteria:located in autosomes,false discovery rate(FDR)≤0.05 and log2fold change>2.In the validation stage,27 regulatory variants were retained which met the criteria of bioinformatics prediction.Four variants were further excluded as a result of the failure of primer design,and three other variants were removed as a result of genotyping failure(call rates<95%),finally,twenty variants were selected to investigate the relationship with breast cancer risk.Eventually,the T allele of rs11471161 which located in lncRNA AC104135.3 was significantly associated with a decreased risk of breast cancer(additive model:OR =0.86,95%CI=0.78-0.96,P=7×10-3),while the A allele of rs3751232 which located in lncRNA RP11-1060J15.4 was significantly associated with an increased risk of breast cancer(additive model:OR = 1.26,95%CI=1.10-1.45,P=1×10-3).After adjusting for age,age of menarche and menopausal status,rs11471161 and rs3751232 were still associated with the risk of breast cancer(rs11471161,OR=0.84,95%CI=0.74-0.94,P=4×10-3;rs3751232,OR=1.20,95%CI=1.02-1.40,P=2.7×10-2).We further performed stratification analysis of these two genetic variants(rsl 1471161 and rs3751232)in subgroups based on age,age at menarche,first live birth,menopausal status(premenopausal and natural menopausal)and receptor status(estrogen receptor(ER),progestrogen receptor(PR)and human epithelial growth factor receptor 2(HER2)status).The association with rs11471161 was significant among women with a younger age at menarche,women with a younger age at first child-birth,premenopausal women and women with advanced stages of breast cancer(P=0.012,0.023,0.042 and 0.026,respectively).In addition,rs11471161 showed a protective effect in ER-positive/PR-positive but HER2-negative status(P=0.006,0.018 and 0.008,respectively),while rs3751232 were significantly associated with the increased risk of breast cancer in ER-positive/PR-positive status(P=0.004 and P=0.036,respectively).However,no heterogeneity was observed in any of the subgroups.Further co-expression analysis indicated that AC104135.3 associated with a known breast cancer-related gene ERBB2(r=0.99,Pfdr=0.0149),which promotes the development and progression of breast cancer through overexpression.In conclusions,our study is the first one to systematically study the association between genetic variants in dysregulatory expression lncRNAs and the susceptibility of breast cancer by transcriptome sequencing,and indentified two novel genetic variants rs11471161 and rs3751232 in lncRNA AC104135.3 and RP11-1060J15.4,respectively,influencing the susceptibility to breast cancer in the Chinese population.Moreover,our newly identified susceptibility biomarkers might be used as the genetic information reference for screening,diagnosis and treatment of breast cancer. |