Font Size: a A A

Construction Of Additive Model With Discrete Input And Its Application To Genomics

Posted on:2021-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y DongFull Text:PDF
GTID:2370330620976902Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
The discrete data,including nominal and counting data,is an important type of data in the data science and artificial intelligence.When the discrete data is used as the input for the regressions,there is the problem of mapping discrete input to continuous output.When the hypothesis space is determined,the analysis of the generalized bound of the model set with discrete input is meaningful for improving the accuracy of the model.This thesis theoretically investigated the estimation method and the representation capacity for the Gaussian generalized additive model with discrete input.The generalized error bound of the Gaussian generalized additive model with discrete input is derived based on the Rademacher complexity theory.Further,the Gaussian generalized additive model is applied to the genomic selection,providing a new solution for selective breeding.This thesis mainly includes three aspects as follows:(1)Based on Rademacher complexity,the generalized error bound of Gaussian general additive model under discrete value input is derived.In genomic selection,because the input data are discrete values,the complexity of the hypothesis space directly affects the generalized error bound of the model.In this thesis,the upper bounds of the Rademacher complexity of the mean and variance of the Gaussian generalized additive model are derived,and the input data of the binomial distribution are used as examples to obtain tighter bounds in classification and regression problems.(2)The improvement of Gaussian generalized additive model for small data sets and the process of solving weights are investigated.In this thesis,the mean and variance in the distribution function are parameterized to construct a Gaussian generalized additive model,and the loss function is obtained by multiplication.The weights of the mean and variance at the minimum of the loss function are obtained,and the accuracy of variance prediction is improved by adding Bagging ensemble learning method.(3)The Gaussian generalized additive model is applied to the Crassostrea gigas data set,and the regression weights of the mean and variance of its condition index are obtained,which build a good foundation for selective breeding.First,this research cleans the data of the whole genomic loci and fullness of Crassostrea gigas provided by the Institute of Oceanology,Chinese Academy of Sciences.Secondly,this research solves the problem of sparse feature representation in high-level space through two-stepwise selection.Finally,the Crassostrea gigas data set is applied to a Gaussian generalized additive model,and the prediction results of its mean and variance are presented.In this research,simulation experiments prove the correctness of the error generalization boundary of the Gaussian general additive model,which provides a theoretical basis for applying it to whole genome data.In applying the Gaussian general additive model to the prediction of the fatness of long oysters,the results are compared with other methods to obtain a mean fitting accuracy of 0.994.The true value of more than 70% of the samples falls within the predicted confidence interval.The results show that the proposed method can provide a more reliable parameter reference in genome-wide selection.
Keywords/Search Tags:Rademacher Complexities, Gauss Generalized Additive Models, Discrete Input, Genomic Selection
PDF Full Text Request
Related items