Font Size: a A A

The Development And Application Of Statistical Analysis For Genetic Epidemiology (SAGE)

Posted on:2008-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:L Y ChenFull Text:PDF
GTID:2178360218461539Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background and objectives: Genetic epidemiology is an edge populardiscipline which developed in recent years. Mainly is in the research of differentcrowd affects the heredity characteristic and the environmental factor which thedisease distributes, and proposes the reasonable preventive measure discipline. Itsrationale is the population genetics and the epidemiology, mainly is the applicationepidemiology community acquisition of information and the processing method, aswell as the molecular genetics experimental method, studies and the explorationheredity factor and the environmental factor with the aid of the biometrics relatedprinciple and the method to the disease alone function as well as they to the diseasejoint action. Along with the international human gene group measured the forewordunified body completes the sequence to human gene group DNA the analysis, in thehuman gene group measured in the foreword process along with the multi-stateproperty sequences symbolized more and more was discovered by the people, seeksthe disease gene the progress to speed up day by day. Has become current and in nextperiod of quite long time to the multi-genes disease research time the focal pointwhich pays attention.Until now, It is to conformed to Mendel to genetic the rule the single genehereditary disease already to establish set of effective research systems and thelocalization has cloned the nearly thousand pathogenesis genes. But regarding multi-genes disease as a result of its complex phenotype character, these complex characters although display the certain family to gather tendentiousness, but certainlyincompletely conforms to Mendel to inherit the rule, therefore still had very manyproblems in its easy feeling gene localization and the heredity analysis, and becamethe recent years medicine genetics and the gene research difficulty and the hot spot.But in studies the humanity to inherit the correlation disease in the research process,investigates the material using the pedigree structure and the community to carry onthe chain-like analysis, the connection analysis or chain-like not balanced analyzeshas become the gene localization the important method. But because genetics datahuge, analysis tedious, the structure is complex, often with difficulty fully uses thematerial with the general statistical method and software the information. Needsspecial heredity statistics software to carry on the analysis, at present inheritsepidemiology statistical analysis software although are more, but its generalizedanalysis ability is not very strong.For example, it is may supply the choice regarding the parameter chain-likeanalysis software to have FASTLINK, LINKAGE, VITESSE and so on, may supplythe choice regarding the non-parameter chain-like analysis software to haveGENEHUNTER, MERLIN, MELINK and so on. At present the domestic heredityepidemiology research is being at the development phase, in the research the majorityof uses is overseas heredity statistics software like LINKAGE, GENEHUNTER andso on, domestic had heredity statistics software is DOS system PPAP, but uses theperson not to be many. Because our country population huge, the demographymaterial is rich, is studies the humanity genetic information very good resourcestreasure house. At present the domestic situation is statistics and the genetics not verygood union, enables inherits the scholar when the collection of information and thematerial analysis to have many problems, which aspect like regarding specificallycollects the material, the sample quantity size and uses what heredity statisticalmethod and so on. Enables the material information to obtain the full use, creates theinformation huge waste, is really a regretable case.Because multi-markers disease its phenotype with gene unstrict one to onecorrespondences relations, therefore in analysis material time, must use the many kinds of analysis method, this also causes present some specially to use in to analyzesome kind of heredity analysis software more and more to expose its application thelimitation, also overseas software generally is English software, this causes to inheritthe scholar to have to waste the massive manpower and the physical resource studiesthese software, therefore urgently needs function formidable comprehensive natureheredity statistics software. But inherits epidemiology statistical analysis softwarepackage SAGE (Statistical Analysis for Genetic Epidemiology) exactly to meet ourneed. It is a function is formidable, can carry on each kind ofheredity statisticalanalysis comprehensive software, inherits the analysis resources by the Americanhumanity (Human Genetic Analysis Resource, HGAR) composes. HGAR wasestablished American Cleveland city Case Western Reserve University (CWRU) theepidemiology and the statistical department, by the US public health servicedepartment, the NIH state-run research resources center subsidizes, this software byfamous statistical geneticist R.C.Elston and its the work team, this software whichbecomes in 1987 research and development along with the time continuouslyunceasingly renews the development, from version 1.0 which just started to presentversion 5.3.0, its function also unceasingly was strengthening, it more and morereceived in the heredity epidemiology analysis status takes.Methods:Five example document brings which through SAGE software does for theprimitive pedigree data file, inducts each function module to carry on themultianalysis, this SAGE altogether has one from the definition module and eighteenfunctions modules, altogether divides into eighteen chapters to carry on separatelynarrates, eighteen functions module respectively carries on by following four contentsindicates:The first chapter: The summarize of SAGE. It has produced SAGE softwarebasic function information and so on module input output document, movementenvironment and characteristic. The user installs when this software must payattention to it to the system request. The second chapter: To establish edit and trim SAGE data file. Mainlyintroduced the data file three establishments way, and the project inducts, derives withcontent and so on heavy naming. The key point is the data file establishment andinducts.The third chapter: User-defined functions module. Mainly introduced how tofound the gene group data file and the establishment new variable. The key contentestablishes the new variable.The fourth chapter: Pedigree information and statistics (PEDINFO). Mainlyprovides many useful descriptive statitstics on pedigree data, Mainly introduced thePEDINFO function, how the principle and do operate and to the result explanation.The key content is to the result explanation. The below fourteen chapters are from themodule function, the principle, the operating process and the main output result and soon. Four aspects carry on the elaboration.The fifth chapter: Non- Mendel inherits the statistical analysis (MARKERINFO).Mainly uses in to examine in the pedigree data the Mendelian inconsistencies geneticinformation, helps the user to do the non-uniform data examination. The premise isinherits the law to Mendel to have the understanding.The sixth chapter: Relative to reclassification (RELTEST). To carries on throughthe gene group positions spots scanning data to the original relative reclassifies,mainly is consistent (IBD) based on the chromosome blood relationship theequipotential line sharing principle. The key point is has the understanding to IBD andIBS, and to result explanation.The seventh chapter: Allele frequency estimate (FREQ). Estimated the knownpedigree structure the individual allele frequency and produces the mark position spotdescription document. Produces the position spot document may use in GENIBD,MLOD and other SAGE procedure. Changes the procedure the most main function tolie in the output position spot document and whether outputs the close relativecoefficient.The eighth chapter: The allele association or the data character transmissionunbalanced examines (ASSOC). Mainly uses in to estimate in the pedigree data file the character with cooperates the variable, this cooperates the variable to be allowedthrough to come by the mark phenotype transformation, estimate family residual errorcorrelation coefficient or heredity estimate. Needs to pay attention is the logarithmaccording to two kind of transformed choices.The ninth chapter: Family correlations analysis (FCOR). Mainly uses in toestimate in pedigree all correlations to the multi-variables correlational dependenceand they advance gradually the standard to harm. The key point is connected to thefamily in to the correlational dependence result explanation.The tenth chapter: The composite liberation analysis and complex separates theanalysis (SEGREG). Mainly uses in in the family correlational dependence foundationwhich provides the examination and the designation separation analysis model. Itscharacteristic may be the continuity, two classified characteristics or the agecorrelation two classified characteristic, produces may use in based on the modelchain-like analysis apparent rate document. The key point is the model choicehypothesis which suits to the different characteristic.The eleventh chapter: The blood relationship identical allele probability has themodule (GENIBD). This function module mainly uses in through the many kinds ofalgorithms coordination to calculate in each kind of pedigree data file the differentcorrelation to have single locus and the multi-locus blood relationship consistent alleledistribution. The key point is the different material must select the different model.The twelvethh chapter: The age correlation manifests suddenly the analysis(AGEON): Is suitable to simultaneously compares exhausts oneself the correlation towith must exhausts oneself the correlation the age to manifest suddenly the correlationdistribution material to, allows through to cooperate the variable adjustment averagevalue, the variance or a distribution. How needs to pay attention is merges thematerial.The thirteenth chapter: Haplotype analysis (DECIPHER): Mainly is uses inregarding the crowd in autosomal or X-linked markers frequency maximumlikelihood estimate. The premise is has the understanding to the haplotype.The fourteenth chapter: Based on single-marker model-based LOD score linkage analysis (LODLINK). Mainly uses in to calculate based on the model principalcharacteristic LOD values two spots with between various units places electricitybetween, the principal characteristic may be any conforms to the mark or othercharacteristics which Mendel transmits. The key point is to the principal characteristicand the apparent rate document naming which produces from the SEGERGprocedure.The fifteenth chapter: Based on model multi-point model-based LOD scoreanalysis (MLOD). Mainly uses in to calculate based on the model small pedigree orbetween everybody department's chain-like analysis multi-positions spot. The keypoint is the gene group data file production and recognizes the principal characteristic.The sixteenth chapter: Is sick the compatriot to the chain-like analysis method(SIBPAL). May be single-or the multi-marker IBD information, and acts accordingto the multiple spot gene simultaneously to use two classifies the variable and thecontinuous variable, simultaneously including on position interactive effectharmonious cooperation variable effect. The key point is the different characteristicmust carry on the corresponding hypothesis.The seventeenth chapter: Exhausts oneself the compatriot to Lods chain-likeanalysis (LODPAL). The procedure carries on the chain-like analysis is based onexhausts oneself the compatriot Lods records points the value to, at present carries outthe generic condition logistic return model. Must pay attention to the potencyhypothesis.The eighteenth chapter: The transmission disequilibrium test (TDT). Inprocedure TDT is establishes based on the transmission not balanced foundationmodel foundation in uses in the analysis premise known is in the chain-like notbalanced situation mark position spot and disease loci chain-like relations, its diseasecharacteristic is two classified variables. The premise is to TDT principle grasping.ConclusionThe result through this paper, causes to inherit the scholar to be allowed fully touse its heredity material to carry on the heredity statistical analysis, saves the manpower and the physical resource, studies this software to be allowed to instruct toinherit the scholar to collect the heredity material, as far as possible use hereditymaterial, thus speeds up the heredity epidemiology the development.
Keywords/Search Tags:SAGE, Genetic epidemiology, function module
PDF Full Text Request
Related items