Font Size: a A A

Genome-wide Generalized Compound Heterozygosity Analysis Identifies Novel Susceptibility Loci For Lung Cancer

Posted on:2022-02-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z J MaFull Text:PDF
GTID:1484306743997269Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background: Lung cancer is the most common malignancy worldwide.According to the Global Cancer Report released by the International Agency for Research on Cancer(IARC),2.2 million new cases of lung cancer was diagnosed worldwide in 2020,accounting for about 11.4% of the new cases of total malignant tumors.Lung cancer caused 1.8 million deaths,accounting for about 18.0% of all cancer deaths.According to the latest statistics released by the China Cancer Registry Center,in 2015,both the morbidity and mortality of lung cancer were the top of malignancies in Chinese male.While in female,the morbidity of lung cancer ranked the second and the mortality ranked the first of malignant tumors.Therefore,lung cancer remains a major public health problem which could threaten human health in China.Lung cancer is a complex disease with a multi-stage process and is influenced by many risk factors.Previous study found that besides tobacco exposure and other environmental factors,the role of genetic factors in lung cancer should not be ignored.Since 2005,the Human Genome Project(HGP)and the International Haplotype Map Project(Hap Map)have been completed.The genome-wide association study(GWAS)has been widely used in the molecular epidemiology research of many complex diseases or traits,and a series of research achievements have been achieved through it.In recent years,GWASs performed in various ethnic populations have found dozens of lung cancer susceptibility variants and several chromosomal susceptibility loci associated to lung cancer.These findings provided more possibilities for the clarification of the potential biological mechanisms in occurrence and development of lung cancer.However,plenty of variants which were found by lung cancer GWASs could only contribute to less than 5% lung cancer heritability together,which is almost 18% in total,and this was also called “missing heritability”.The missing heritability suggested that there are still a large number of unknown susceptibility variants which need to be further studied.Previously,the GWAS generally focused on single genetic variant and lacked concerns about the variant pair on compound heterozygosity,which may be one of the potential solutions for “missing heritability”.In conventional genetics,compound heterozygosity(CH)means two different mutant alleles at a particular genome locus occur on each homologous chromosome.Generalized CH(GCH)was named when the two genetic variants are not necessarily coding,rare and from the same chromosome,and also could be involved in a wide range of human traits.In 2013,Lin et al.found that a compound heterozygosity composed of two genetic variants on the MMP16 and ROBO1 were associated with the development of aggressive prostate cancer.The results of this study suggested that,the use of the compound heterozygosity model promotes the discovery of new lung cancer susceptibility genetic variants and facilitate explaining the missing heritability of lung cancer.However,GWAS involves a bulk of human genetic variants,previous studies lacked efficient methods to efficiently detect compound heterozygosity in whole genome.Previous studies usually carried out only for specific genes or chromosomal regions,and less attention was paid to potential compound heterozygosity in whole genome.In 2016,Zhong et al.published an analytical method named "Generalized Collapsed Double Heterozygosity(GCDH)test",and performed a genome-wide search for relaxed forms of compound heterozygosity by GCDH test in actinic keratosis(AK).Their study showed compound heterozygosity in novel and pigmentation-related loci conferring genetic risk of AK.Thus,this method made up for the “missing heritability” of complex diseases.In addition,given the obvious differences between the Chinese and the Caucasians in genetic background,research for different populations will also help to find the real pathogenic genetic variation.In order to further consummate the lung cancer genomic landscape,based on multicenter and large sample size of case-control study design,this research performed a GCDH analysis in Chinese and European.The research results are beneficial to further deepen the understanding of lung cancer genetic susceptibility.It provides important theoretical value and practical guiding significance for precise prevention of lung cancer,screening of high-risk populations and individualized treatment.Part I.Genome-wide generalized compound heterozygosity analysis identifies novel susceptibility loci for Chinese lung cancerMethods: We performed a case-control study including a total of 12,434 Chinese lung cancer cases and 13,328 controls.All cases and controls were collected from three GWASs:(1)2331 cases and 3077 controls from the Nanjing Medical University lung cancer genome-wide association study(NJMU GWAS);(2)10,248 cases and 9298 controls from GSA genome-wide association study in Chinese population;(3)953 cases and 953 controls from Oncoarray database of Chinese lung cancer.All the subjects and variants were passed quality control.Briefly,samples were excluded who were with call rate<95% or sex discrepancy,or contamination,or familial relationships,or extreme heterozygosity;For genetic variants,we excluded duplicate variants,variants not in auto-chromosome and variants with call rate <95%,or rare variants,or off Hardy-Weinberg equilibrium.SHAPEIT and IMPUTE2 were used to impute the genotypes with reference from 1000 Genomes Project.Generalized Collapsed Double Heterozygosity(GCDH)analysis was performed in Chinese.GCDH test detects the association between CH genotypes and binary traits by applying a chi-squared statistic to pseudo-genotypes collapsed from a pair of single variant,which has a slidingwindow based implementation.R package Collaps ABEL was used to iteratively test the association between the pseudo-genotype of compound heterozygosity and lung cancer risk.Variants annotation was performed by public database such as ENCODE,FANTOM5 and Roadmap for non-coding variants,and SIFT,Poly Phen-2 as well as CADD for variants in coding region.If the compound heterozygosity was annotated with affecting the promoter,enhancer,histone modifications or expression association of particular genes,the genes are considered as potential susceptibility genes.Results: The present study identified eight risk loci with eleven pairs of compound heterozygosity in Chinese lung cancer.The novel lung cancer associated locus is 19p13.3(rs149871862/rs55999131,OR = 0.77,95%CI = 0.71-0.83,P = 7.91 × 10-11).WDR18 is the potential susceptibility gene of this variant pair.In addition,there are ten compound heterozygosity variant pairs located at known lung cancer related locus: In 6p21.33,the LD relationship between rs9265194/rs2854003(OR = 1.10,95%CI = 1.08-1.13,P =1.06 × 10-13)and known variant rs9469031 were r2 = 0.13 and r2 = 0.04 respectively.TCF19 was the potential susceptibility gene;In 3q28,the LD relationship between rs9825172/rs13097746(OR = 0.88,95%CI = 0.85-0.91,P = 2.52 × 10-17)and known variants rs10937405 and rs4600802 were r2 = 0.86 and r2 = 0.21 respectively.TP63 was the potential susceptibility gene of this compound heterozygosity variant pair;In 5p15.33,the LD relationship between rs71575564/rs40183(OR = 0.88,95%CI = 0.85-0.91,P = 7.39 × 10-13)and known variants rs4975616 and rs465498 were r2 = 0.90 and r2 = 0.52 respectively.CLPTM1 L was the potential susceptibility gene of this compound heterozygosity variant pair;In 6p21.32,we found four pairs of compound heterozygosity: rs200860466/rs143127183(OR = 1.13,95%CI = 1.09-1.16,P = 1.30 × 10-14),its LD relationship with known variant rs2395185 were r2 = 0.87 and r2 = 0.01 respectively.HLA-DRB5 was the potential susceptibility gene of this compound heterozygosity variant pair;rs9272938/rs9274490(OR = 1.12,95%CI = 1.09-1.16,P = 8.32 × 10-7)its LD relationship with known variant rs2395185 were r2 = 0.49 and r2 = 0.02 respectively.Sixteen MHC genes such as HLA-DPA1,HLA-DOA and HLADQB1 were the potential susceptibility genes of this compound heterozygosity variant pair;rs28746878/rs9275156(OR = 1.10,95%CI = 1.07-1.13,P = 7.12 × 10-11),its LD relationship with known variant rs2395185 were r2 = 0.22 and r2 = 0.22 respectively.Seventeen MHC genes such as HLA-DMA,TAP2 and HLA-DRB1 were the potential susceptibility genes of this compound heterozygosity variant pair;rs13213605/rs17205872(OR = 0.91,95%CI = 0.89-0.93,P = 4.43 × 10-13),its LD relationship with known variant rs2395185 were r2 = 0.14 and r2 = 0.00 respectively.Sixteen MHC genes such as BRD2,HLA-DOA and HLA-DRA were the potential susceptibility genes of this compound heterozygosity variant pair;In 8p12,the LD relationship between rs7820838/rs60099073(OR = 1.09,95%CI = 1.06-1.13,P = 2.52 × 10-10)and known variant rs4236709 were r2 = 1.00 and r2 = 0.01 respectively.NGR1 was the potential susceptibility gene of this compound heterozygosity variant pair;In 11q23.3,the LD relationship between rs548285997/rs1056562(OR = 1.12,95%CI = 1.08-1.15,P = 1.29 × 10-11)and known variants rs1056562 were r2 = 0.01 and r2 = 1.00 respectively.FXYD2 was the potential susceptibility gene of this compound heterozygosity variant pair;In 17q24.2,the LD relationship between rs62084755/rs9896198(OR = 0.90,95%CI = 0.87-0.93,P = 2.16 × 10-10)and known variant rs62084755 were r2 = 0.59 and r2 = 0.05 respectively.Two genes WIPI1 and KPNA2 were the potential susceptibility genes of this compound heterozygosity variant pair.Conclusion: This study based on genotyping data with large sample size,further found a new Chinese people lung cancer genetic susceptibility locus by GCDH analysis.At the same time,it also verified the previous reports susceptibility loci of lung cancer.Moreover,it further improved the understandings of lung cancer genetic background,facilitated the individualized prevention of lung cancer.Part II.Genome-wide generalized compound heterozygosity analysis identifies novel susceptibility loci for Caucasian lung cancer and trans-ancestry comparisonMethods: We performed a case-control study including a total of a total of 20,871 Caucasian lung cancer cases and 15,971 controls.All cases and controls were collected from two GWASs:(1)18,444 cases and 14,027 controls from the Oncoarray lung cancer genome-wide association study(Oncoarray GWAS);(2)2427 cases and 1944 controls from DCEG genome-wide association study in Caucasians.All the subjects and variants were passed quality control.For Oncoarray GWAS,the imputation work has been performed in article from Mc Kay et al.For DCEG GWAS,SHAPEIT and IMPUTE2 were used to impute the genotypes with reference from 1000 Genomes Project.Compound heterozygosity analysis was performed in Caucasians.R package Collaps ABEL was used to iteratively test the association between the pseudo-genotype of compound heterozygosity and lung cancer risk.Variants annotation was performed by public databases such as ENCODE,FANTOM5 and Roadmap for non-coding variants,and SIFT,Poly Phen-2 as well as CADD for variants in coding region.If the compound heterozygosity was annotated with affecting the promoter,enhancer,histone modifications or expression association of particular genes,the genes are considered as potential susceptibility genes.Finally,to compare the genetic similarities and differences of compound heterozygosity pairs between Chinese and Caucasians,the trans-ancestry associations and frequency of each pair were analyzed.Results: The present study identified five risk loci with seven pairs of compound heterozygosity in European lung cancer.The novel Caucasians lung cancer associated locus is 6p21.32(rs9276757/rs9380326,OR = 1.11,95%CI = 1.08-1.14,P = 1.76 × 10-11).HLA-DRB5,HLA-DQB2 and other eight MHC genes are the potential susceptibility genes of this compound heterozygosity pair.In addition,there are six compound heterozygosity variant pairs located at known lung cancer related locus: In 5p15.33,the LD relationship between rs56345976/rs2736108(OR = 1.10,95%CI = 1.08-1.13,P =1.06 × 10-13)and known variant rs465498 were r2 = 0.09 and r2 = 0.18 respectively.For this SNP pair,NDUFS6 is the potential target gene for rs56345976 and TERT is the other gene for rs2736108;In 15q25.1,the LD relationship between rs2656058/rs5019044(OR = 0.87,95%CI = 0.85-0.89,P = 6.46 × 10-33)and known variants rs6495306 and rs6495306 were r2 = 0.50 and r2 = 0.56 respectively.TBC1D2 B and IREB2 were the potential susceptibility genes of this variants pair;In 5p15.33,the LD relationship between rs37006/rs4975677(OR = 0.91,95%CI = 0.89-0.93,P = 1.47 × 10-15)and known variants rs31489 and rs402710 were r2 = 0.87 and r2 = 0.01 respectively.CTD-2012J19.3,RP11-43F13.1 and CTD-2245E15.3 were the potential susceptibility genes of this variants pair;In 6p21.33,the LD relationship between rs3132556/rs3130976(OR = 1.09,95%CI = 1.06-1.12,P = 2.29 × 10-11)and known variants rs3094604 and rs1264308 were r2 = 0.43 and r2 = 0.29 respectively.FLOT1,CYP21A1 P,HLA-B and other 33 MHC genes were the potential susceptibility genes of this variants pair;In 6p22.1,the LD relationship between rs2517681/rs2844796(OR = 1.07,95%CI = 1.05-1.10,P = 2.69 × 10-11)and known variant rs4324798 were r2 =0.06 and r2 = 0.23 respectively.HLA-F,PPP1R11,TRIM26,TRIM39 and TRIM39-RPP21 were the potential susceptibility genes of this variants pair;In 6p22.1,the LD relationship between rs1136903/rs1960063(OR = 1.08,95%CI = 1.05-1.10,P = 5.89 × 10-11)and known variant rs1264308 were r2 = 0.57 and r2 = 0.12 respectively.HLAA was the potential susceptibility gene of this variants pair.Then we compared the GCDH results between Chinese and Caucasian lung cancers and found that six compound heterozygosity pairs had no heterogeneities: rs56345976/rs2736108(heterogeneity P = 3.27 × 10-1),rs71575564/rs40183(heterogeneity P = 7.43 × 10-2)and rs37006/rs4975677(heterogeneity P = 4.71 × 10-1)on 5p15.33;rs1136903/rs1960063(heterogeneity P = 3.17 × 10-1)and rs2517681/rs2844796(heterogeneity P = 8.77 × 10-1)on 6p22.1;rs9272938/rs9274490(heterogeneity P = 8.75 × 10-2)on 6p21.32.The other twelve compound heterozygosity pairs had significant heterogeneities: rs9825172-rs13097746(heterogeneity P = 1.52 × 10-3)on 3q28;rs200860466/rs143127183(heterogeneity P = 2.12 × 10-9),rs28746878/rs9275156(heterogeneity P = 1.26 × 10-2),rs9276757/rs9380326(heterogeneity P = 1.14 × 10-4)and rs13213605/rs17205872(heterogeneity P = 1.11 × 10-16on 6p21.32;rs9265194/rs2854003(heterogeneity P = 9.68 × 10-6)and rs3132556/rs3130976(heterogeneity P = 3.05 × 10-5)on 6p21.33;rs7820838/rs60099073(heterogeneity P = 1.13 × 10-2)on 8p12;rs62084755/rs9896198(heterogeneity = 1.28 × 10-3)on 17q24.2;rs2656058/rs5019044(heterogeneity P = 1.52 × 10-3)on 15q25.1;rs548285997/rs1056562(heterogeneity P = 6.09 × 10-3)on 11q23.3;rs149871862/rs55999131(heterogeneity P = 2.08 × 10-9)on 19p13.3.Conclusion: Based on genotyping data with large sample size,this study further found a new European people lung cancer genetic susceptibility locus by compound heterozygosity analysis.At the same time,it also verified the previously known susceptibility loci of lung cancer.Moreover,the comparison between Chinese and European indicated the shared and specific compound heterozygosity SNP pairs and further improved the understandings of lung cancer genetic background.In all,the novel lung cancer related compound heterozygosity variant pairs facilitated the understandings of lung cancer genetic background and individualized prevention of lung cancer.
Keywords/Search Tags:Lung cancer, Compound heterozygosity, Susceptibility loci, GWAS
PDF Full Text Request
Related items