There are many factors affecting human complex diseases,including genetic factors,environmental factors and their interaction effects.Identifying potential risk factors is important for understanding their biological mechanisms and developing public health prevention strategies.Although extensive genome-wide association studies(GWAS)have identified thousands of genetic variants associated with complex diseases,these variants are common variants and explain only a small proportion of trait heritability.Rare variants and genetic interaction effects are considered to be the two potential sources of "missing heritability".In fact,rare variant association analysis and gene-environment interaction effect analysis have identified novel genetic risk factors,emphasizing their importance in genetic variant association analysis.It has been an important challenge in the field of genetic variant association analysis to maximize the statistical power while controlling false positives.In genetic variant association analysis,in order to find pathogenic sites,it is necessary to make full use of the existing information in the data to improve the statistical power.Compared with the standard prospective likelihood analysis method,retrospective likelihood method can significantly improve the statistical efficiency by using the genetic information among genotypes and the relationship between genes and environmental factors.In addition,the statistical power can be further improved by assuming gene is independent of environmental factors in the retrospective likelihood analysis,while the effect estimation would have serious bias and the false positive rate would be significantly inflated if the independent assumption is violated.One of the objectives of this paper is to investigate the effects of maternal and children common genetic variants and environmental risk factors on obstetrical and early life phenotypes.Case-control mother-child pair design is a common experimental design for this type of study due to its ease of collection.In such studies,retrospective likelihood can fully utilize available information such as Mendelian inheritance,random mating and conditional independence between maternal environmental risk factors and children genotype given maternal genotype,effectively improving statistical inference.In this paper,empirical Bayesian method is used to combine two retrospective likelihood methods(one assuming gene-environment independence and the other does not specify the gene-environment relationship)to obtain two empirical Bayesian(EB)estimators.The new estimators have strong data adaptive ability,and their advantages include:(1)for the parameter estimation problem,the mean square errors of those methods are usually smaller than those of the existing methods;(2)for the hypothesis test problem,those methods can better control the type I error rate and have higher statistical power than those existing methods;(3)the performances of those method are insensitive to the misspecified prevalence,including the estimation of parameters of interest and the type I error rate.Asymptotic normality of the two EB estimators are established,which can be used to construct confidence intervals and association tests of genetic effects and gene-environment interactions.Simulations and real data analyses are conducted to demonstrate the desired performance of our new method.Another objectives of this paper is to conduct rare variant association analysis based on case-control data.Most existing methods are based on prospective likelihood model,so they are robust but might not be powerful enough.In this paper,retrospective likelihood tests based on gene-environment independent assumption are derived,and a class of weight tests would be developed by appropriately weighting the above retrospective likelihood based tests and the existing prospective likelihood based tests.The weight tests have two advantages:(1)it can control the type I error rate regardless of the gene-environment independence;(2)compared with the existing prospective methods,the power of those methods are higher overall.Extensive simulation studies and real data analysis have shown that our proposed methods can well control type I error rates regardless of whether the gene-environment independence assumption holds,and are generally more powerful than existing prospective likelihood based methods. |