Font Size: a A A

Studies On Multiple Single Nucleotide Polymorphism Association And Gene-gene Interaction

Posted on:2021-10-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:K CheFull Text:PDF
GTID:1480306569982739Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The study of complex traits plays an important role in improving the agronomic traits and quality of economic crops,promoting breeding levels,and promoting the prevention,diagnosis and treatment of complex diseases of humans.With the rapid development of high-throughput sequencing technology,more and more whole-genome sequences of model organisms have been obtained,which brings us opportunities for the study of complex traits.At the same time,how to accurately and efficiently find the susceptible genes that affect complex traits from the massive genetic data has brought great challenges to researchers.Using bioinformatic methods to mine the association between genes and complex traits,and to reveal the molecular mechanism and the action mechanism of complex traits is the focus of current research.Therefor,this article utilizes single nucleotide polymorphism(SNP)and phenotype data as the main research materials,micro RNA as the intermediate medium,and bioinformatic as the research tool to perform in-depth research of multi-loci association study,SNP-miRNA-disease association prediction,SNP epistasis detection and gene-gene interaction detection.The research includes the following four aspects:(1)Aiming at solving the low statistical power of traditional single-locus association study in GWAS,a multi-loci detection method based on weighted sparse group Lasso(WSGL)is proposed according to the biological hypothesis that a few susceptible SNPs are located in a few genes.First,the sparse group Lasso is used to select SNPs associated with traits between and within genes.Second,for improving the ability to screen lowfrequency susceptible SNPs,the minimum allele frequency(MAF)is used to construct the weight to adjust the penalties of different SNPs in model.Considering the sparse between SNPs and genes,WSGL uses prior biological knowledge to adjust the weights of SNPs.The experimental results show that WSGL not only improves the association detection power but also shows consistent with biological significance.(2)Regarding the lack of negative examples in the traditional miRNA-disease association prediction methods based on machine learning,and the impact of miRNA functional similarity from MISIM on precision of cross-validation,we propose a miRNA-disease association prediction method(LFEMDA)based on hidden feature extraction only using positive samples.First,a miRNA functional similarity calculation method is come up with by miRNA sequences to replace the traditional miRNA functional similarity.Then,to obtain the optimal solution in iterative process of the algorithm,a priori knowledge of miRNAs and diseases is introduced as auxiliary variables.Finally,by querying miRd SNP database to get the association among SNP-miRNA-disease,we can verify the action mechanism of how miRNAs affect complex diseases by SNPs.LFEMDA only uses positive examples for feature extraction,which avoids introducing erroneous information by using the unknown miRNA-disease association as negative samples in traditional methods.Experimental results show that LEFMDA is superior to other methods and can effectively predict miRNA-disease association.(3)In view of the problem that traditional SNP epistasis detection methods have a poor performance on pure epistatic effect model and unbalanced data,a method called permutation-based gradient boosting machine(p GBM)is proposed for detecting epistasis based on permutation strategies and gradient boosting model.First,the original model is trained on the initial data using gradient boosting.Second,the initial data is transformed by two different permutation strategies.One strategy retains only the main effect of SNP,and the other strategy retains both the main and epistasis of the pair of SNPs.The difference value between experimental results on two new data sets can measure the interaction between a pair of SNPs.Finally,in order to improve the detection accuracy of the model on unbalanced data,the difference between the average AUC values is used as the interaction measure index of two SNP loci.The experimental results demonstrate the effectiveness of p GBM,and especially its ability to detect pure epistasis and unbalanced data.(4)To solve SNP dimension disaster and low statistical power in gene-gene interaction detection methods based on SNP loci,we propose a gene-based method using Gini impurity(GBGi).First of all,a measurement index is constructed by the Gini gain of two genes acting alone and together.Then,the significant difference p-value of the interaction between two genes can be obtained by using the measure index and a permutation strategy.Finally,according to the threshold,we can determine the interaction of two genes.GBGi conciders gene as the research unit,does not need parameter setting,and can well detect the linear and nonlinear relationship.Experimental results show that GBGi can accurately and effectively detect gene-gene interaction.In conclusion,for mining the susceptible genes associated with complex traits,this article makes use of SNP and phenotype data to introduce a multi-loci association detection method based on weighted sparse group Lasso,a permutation-based Gradient Boosting Machine method for detecting epistasis based on permutation strategy and gradient boosting model,a miRNA-disease association prediction method based on latent feature extraction only using positive samples,and a gene-gene interaction detection method based on Gini impurity.Compared to other methods,the experimental results show that the methods in this article improve the detection accuracy.Meanwhile,the research has important theoretical significance and potential application value for revealing the association between complex traits and genes.
Keywords/Search Tags:genome-wide association study, gene-gene interaction, weighted sparse group lasso, latent feature extraction, gradient boosting model
PDF Full Text Request
Related items