Font Size: a A A

Application Of Linear Regression Model Based On Component Data

Posted on:2021-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2480306305972879Subject:Master of Applied Statistics
Abstract/Summary:PDF Full Text Request
In microbiological analysis,an important problem is to identify which groups of bacteria play a role in the problem to be studied.Microorganism is a typical component data type of the data,and bacteria with different classification categories。Main purpose of this paper is to introduce the composition of the linear regression model is applied to the data in the process of anaerobic fermentation,analyzing what anaerobic fermentation bacteria will exert significant impact effects on biogas production,and in the process of anaerobic fermentation to identify bacteria,to promote biogas production.In this paper,the basic concepts of component data,coordinate descent method,augmented Lagrangian method,lasso estimation method and microbial characteristic analysis are introduced.After introduced the composition data variable selection model,the new model is based on estimating method of the lasso,and increased zero and limit of parameters,to increase the use of coordinate descent method and the method of augmented Lagrangian method to calculate the model,and the programs of the introduction method to select parameters for the numerical simulation of the model respectively for the convex optimization toolbox CVX and the method of coordinate descent method is simulated,each method are compared with those of the general lasso estimate method,found in six kinds of PE,l1loss,l2loss,lnlloss,FP,TP performance parameters,The proposed method is more effective than the general lasso estimation method.Moreover,by comparing the two methods,it is found that when the data is high-dimensional,the latter method has higher accuracy and smaller error.However,when p>>2n the error of the two methods is similar,and the CVX method is faster.Then,on the basis of the selection model established before,since microbial bacteria have different classification categories,it is necessary to consider the consistency of microbial components,and then introduce the component data regression model applicable to the general situation.Because the parameter estimation calculated by the model is biased,the calculation method of unbiased estimation and confidence interval of parameters is further introduced.Finally,the effect of the model is simulated by setting parameters.The estimation of confidence interval and the numerical simulation of the effect of variable selection under confidence interval are carried out respectively.From the simulation results,it can be concluded that the proposed model performs better than the general model both in terms of variable selection and estimation of confidence interval.Moreover,the numerical performance of TPR and FPR is relatively accurate.Finally,the established model is applied to the anaerobic fermentation process,and anaerobic fermentation bacteria in the classification of bacteria are selected as the basic data unit in the study and application of components and sub-components,and the processes such as parameter estimation,variable selection,unbiased estimation and confidence interval calculation are carried out.Finally,the influence of Gammaproteobacteria and Mollicates on biogas production during anaerobic fermentation was identified,and it was found that the former promoted biogas production,while the latter inhibited biogas production.
Keywords/Search Tags:Compositional data, coordinate descent method, augmented Lagrange, lasso estimation, anaerobic fermentation
PDF Full Text Request
Related items