Font Size: a A A

Statistical Analysis Of Medical Genetic Research Data And SAS Implementation

Posted on:2011-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:J GuoFull Text:PDF
GTID:2154360308474976Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Mathematical statistic analysis plays an irreplaceable role in the process of medical genetics development. As the medicine developed and experimental technology of genetics constantly updated, many technologies of genetic statistical analysis have matured, and the applications are gaining in popularity. New analysis methods are continuing to emerge. For the new and more complicated methods, how to use the mature and popular methods and achieve fast calculation are becoming the problems confronted by genetic researchers today. This study researched statistic methods of medical genetic data in detail, especially in genetic results of multiple comparison correction, association studies between multiple loci and disease, and linkage analysis. Through repeated calculations, personal views were put forward. All the methods of statistical analysis are accomplished by SAS software which is the most famous and authoritative in the world, by calling the procedure, and programming for the calculation.For the present statistical analysis of medical genetics, this study focuses on the following work:First part: Measuring gene frequency, genotype frequency and validating Hardy-Weinberg equilibrium.Hardy - Weinberg equilibrium plays a very important role in the study of genetics. Before genetic analysis, it is preferred to test whether the data obeys Hardy - Weinberg equilibrium. This chapter describes the basic theory of Hardy - Weinberg equilibrium and uses software to calculate gene frequency, genotype frequency, verify Hardy - Weinberg equilibrium, and correct probability by Monte Carlo simulation. Second part: case-control method used to find related loci with diseases.Case - control study is the most basic and important method in genetic epidemiological research, which is an important tool for testing hypothesis of causes of disease. In genetic epidemiology, genes associated with complex diseases could be found by using case - control study. Theχ2 test and Armitage trend test can be used. Using theχ2 test for association between the disease and loci, groups should satisfy the test of Hardy - Weinberg equilibrium. Research shows that if not satisfied, the first type error will increase. So the Armitage trend test ought to be applied based on genotype data.Third part: correction of genetic analysis.In analysis process of case - control genetic epidemiological data, with the rapid development of biotechnology, rapid detection of a large number of loci has become a conventional method in the laboratory. Each locus needs to be statistically tested. If there are too many loci, multiple comparisons may lead to increase the false-positive rate infinitely and make wrong conclusions. Corrected multiple comparisons are necessary. This chapter uses three kinds of modified smooth methods, Bonferroni correction method, and Sidak method.Forth part: Association analysis of family data.Using family members as controls is the best way to match to the origins of ancestors. Family members with the same genetic background as controls can solve the problem of population stratification. Different family members correspond to different analysis. This chapter uses TDT, s-TDT and SDT test to the family case-control data.Fifth part: linkage disequilibrium and haplotype analysis.Linkage disequilibrium analysis and haplotype analysis are efficient methods to genes positioning associated with disease and have played a large role in detecting complex disease genes. In data collection, it does not involve to collect family information, which is different from association analysis of the family data, its more broad application conditions. This chapter studies the detection method of linkage disequilibrium, haplotype and disease association analysis in detail.Sixth part: calculation of inbreeding coefficient and relationship coefficient Consanguineous marriage is non-random mating. Such marriage has a serious impact on the genes equilibrium law of the group, resulting in the change of rate of homozygous and heterozygous in groups. Hardy - Weinberg law applies only to groups of random mating but not such groups. This chapter calculates inbreeding coefficient and relationship coefficient of inbreeding groups.Seventh part: Linkage analysis.During meiosis and sex cells formation, the frequency of exchange between homologous chromosomes is called recombinant rate. Recombinant rate is related to the distance of two loci of the same chromosome. The longer the distance is, the higher the exchange opportunities are. A recombinant rate greater than 0.50 indicates that the two loci are not on the same chromosome. A relatively lower recombinant rate indicates two close loci, and both alleles are not independent passing to the next generation, which is called genetic linkage. This chapter introduces the Bayesian methods and Monte Carlo simulation method to estimate the recombination rate.This article uses many procedures and programming in the genetics and stat of SAS9.1.3, SAS9.2 software for statistical calculation of medical genetics data. This article uses the following general notions: a statistical model combined with case study, theoretical research combined with software implementation, and mathematical methods combined with genetic experiments. It introduces a variety of genetic statistical analysis methods and statistical models, in particular, the correction of genetic results, multiple loci associated with the disease, and linkage analysis. These methods are detailed explained and some new views are put forward. The paper highlights the application and realization of statistical analysis techniques, and it provides not only a statistical methodology for the medical genetics, but also a new platform for genetic data computation.
Keywords/Search Tags:Medical genetics, statistical analysis, SAS, multiple comparison, linkage disequilibrium
PDF Full Text Request
Related items