Font Size: a A A

Data Parallel Computing And Meta-analytic Models Applied To The Association Studies Of Milk Production Traits In Holstein Cattle

Posted on:2019-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:S S YinFull Text:PDF
GTID:2393330596488517Subject:Animal breeding and genetics and breeding
Abstract/Summary:PDF Full Text Request
Firstly,this study explored the relationship between data parallel computing and the meta-analytic model by computer simulation.Meta-analysis collects the research results related to the topic,calculates each study result as a unified index(effect amount),and then quantifies the effect of different studies for statistical analysis.The commonly meta-analysis models include fixed effect meta-analysis and random-effect meta-analysis.Parallel computing is to decompose a large data into a number of smaller data,and then calculate the small data at the same time on several computer CPUs(or cores),and finally calculate the estimation of the overall parameters by weighting.This study shows that the data parallel computing method based on linear regression analysis can be regarded as a fixed-effect meta-analysis model implemented by parallel computing methods.The weighting is a matrix of variance and covariance of model parameters.However,if we ignore the covariance between the model parameters,the estimated gene effect in the Meta-analysis is biased,and the deviation depends on the size of the covariance.When the allelic frequencies are comparable,the parallel calculation of single genes(markers)is equivalent to a fixed-effects meta-analysis model,and the estimated gene effects are consistent.In this study,a computer simulation method was used to study the effects of data parallel computing and meta-analysis on the variance and square deviation of estimated gene effects under different heritability,population size,and data segmentation.Through simulation experiments we have obtained the following conclusions:1.The gene effects estimated by the parallel analysis and fixed-effect meta-analysis method have a high correlation(r>0.97).The difference between the two is mainly due to the Monte Carlo error.2.The variance and squared deviation of the estimated gene effect were significantly different(p<0.05)or significantly different(p<0.01)between different heritability and population size: the larger the population,the higher the heritability,and the variance and square of the estimated gene effect.The smaller the deviation,the higher the accuracy and accuracy.3.Estimation of gene effects using extremely small samples(N=50).Although the variance is small(accurate),the average deviation is often very large,and therefore the accuracy of the estimate is poor.When the population size is greater than 100,the variance and squared deviation of the estimated gene effect tend to be stable,both of which are close to zero.However,for low heritability traits,the population size expected to effectively estimate gene effects may be much larger than this number.As a practical application,four methods(data parallel computing,fixed-effects and random-effects meta-analysis models,and a Mega analysis model)were used to estimate the three milk production traits of the 48 candidate genes for Holstein cows(milk production,The genetic effects of milk protein content,milk fat content).Among them,the Mega analysis is the merger of the original data of each independent study and a one-time calculation of the overall.Research indicates:1.the candidate gene effects estimated by the four methods are highly correlated.The estimated gene effects of data parallel computing and Mega analysis are exactly equal,indicating that they are equivalent in the statistical model.The genetic effects of Meta-analysis are approximate because the covariance of model parameters is ignored.2.The variance of the gene effect estimated by the meta-analysis method of random effects is significantly greater than the variance of the gene effect estimated by the other three methods.The reason is that the variance of the gene effect estimated by the former is not only the random sampling variance but also the difference between the heterogeneous variance of different studies(data).3.This study identified and validated candidate genes that significantly affected three milk production traits.The candidate genes that have a significant genetic effect on milk production are: DGAT,DECR1 gene SNP7,SNP10,SNP11,SNP13,SNP9,SNP8,and rs29021694 of the MER gene locus.The candidate genes with significant genetic effects on milk fat content are: DGAT gene,DECR1 gene SNP7,SNP8,SNP10,SNP13,SNP9,SNP11,MER gene locus rs29021694.The candidate genes with significant genetic effects on milk protein levels are: DECR1-SNP7,SNP10,SNP13,SNP8,SNP9,SNP11,DGAT genes,and MER locus rs29021694.
Keywords/Search Tags:association study, Meta-analysis, data parallel computing, candidate gene, milk production trait, Holstein cattle
PDF Full Text Request
Related items