Font Size: a A A

Optimal Model Construction In Some Kinds Of Commonly Used Nonlinear Regression Analysis And Their Intelligent Realization By SAS

Posted on:2013-02-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:H GaoFull Text:PDF
GTID:1110330374960922Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
【Objective】To overcome the almost insurmountable obstacles in practicalapplications of some kinds of regularly used simple or multiple nonlinear regressionanalysis(NRA), and obtain the regression model best befitting practical data, therebyperfecting those nonlinear regression analyses both in terms of theory andmethodology and promoting the operability and intelligent implementation to anincreased popularity and more successful application of these models.【Content】Nonlinear regression analysis in this study is divided into twocategories: fixed mode and unfixed mode. Fixed mode is a mode in which bothexplanatory variables and model structure have been determined. Unfixed mode is amode in which only model structure has been decided. According to the hierarchicalstructure of the data, nonlinear regression analysis of unfixed mode can be subdividedinto two types: single level mode and multilevel mode.Nonlinear regression analysis of fixed mode with one independent variable inthis study includes sum of exponentials curves, sigmoidal growth curves, andyield-density curves. These models are used frequently in practical application.However, in these complicated structure models, many parameters need to beestimated. Although there are many methods which can fit these models, the precisionof fitness is vastly inferior to the effect of nonlinear least squares (NLS) method. Themethod of NLS is based upon iterative algorithm, and users must specify veryapproximate initial values of parameters, otherwise the iteration may not be able toachieve convergence or the model is just a local optimization. Therefore, it is asubject worthy of research how to obtain a precise and general optimal curve modelswiftly. Additionally, there are generally multiform models in each kind of curvemodel. Take for example yield-density curves model, it includes Bleasdale-Neldercurve model, Halliday curve model, Farazdaghi-Harris curve model. In-depth researchis needed to identify the best model from multiform models while dealing withpractical problems. Single level nonlinear regression analysis of unfixed mode with multipleindependent variables in this study includes single level models for binary, ordinal,multinomial and count outcomes. These methods yield very simplified and nicely fitregression model by using variable selection. However, major methods of variableselection, including forward method, backward method, stepwise method, all havetheir drawbacks in terms of theory and cannot ensure that the model is the best one.Consequently, it is a technical challenge to put forward or actualize the perfectvariable selection method. Additionally, there are always several available regressionanalysis methods for each data. For example, nonlinear regression methods used todeal with qualitative data with binary dependent variable, include logistic regressionanalysis, probit regression analysis and complementary log-log regression analysis.Another technical problem that merits great attention is how to compare the effect ofseveral regression methods and provide the best models automatically.Multilevel nonlinear regression analysis of unfixed mode with multipleindependent variables in this study includes multilevel models for binary, ordinal,multinomial and count outcomes. These analysis models consist of fixed effects andrandom effects, and the two kinds of effect variables are termed effect items. Inpractical use, at present a desired method is not available to assemble effect itemsrationally and construct an optimal regression model. The largely artificially selectedmethod frequently used is very cumbersome to manipulate. Accordingly, a technicalproblem calling for urgent solution is how to flexibly achieve an optimal combinationof effect items in multilevel nonlinear regression analysis. Furthermore, there areseveral regression analysis methods for each data, as in the case of single levelnonlinear regression analysis. For this reason, it is also an important research topic tofind the most appropriate analysis method for the specific data. Moreover, studies ofmultilevel modeling analysis have been booming, but there is room for improvementin algorithms for parameters estimation. In actual use, an in-depth study isimperatively needed to select the estimation method and test the hypothesis forparameters correctly.The paper researches and overcomes the technical difficulties in nonlinearregression analysis described above by making use of programming language,advanced programming skills and corresponding procedures of SAS softwareemployed in intelligentized and roboticized analysis of data, providing users withanalysis outcome by optimal methods. 【Methods】How to do nonlinear regression analysis of fixed mode? The paperadopts a strategy of combining linearisation method with NLS method. To be specific,the estimates obtained by linearisation method are treated as starting value, andsubsequently NLS method is employed to achieve a better curve model. In the processof linearisation, some simple models are subject to linear regression analysis (LRA)directly after variable transformation and mathematical calculation. In othercomplicated models that cannot be linearized directly, one or more incomprehensiveparameters are selected as loop variables, changeable in a small range by definitestep-size. In this way, the parameters will have certain values at each loop, and thecomplicated models can be linearized after variables transformation and mathematicalcalculation, generating models that can be evaluated via linear regression analysis.After some essential calculation, parameter values estimated by LRA can be regardedas the starting values of corresponding parameter in curve models. Nevertheless, ifparameters have several set values, there will be many combinations of initial values,and consequently a plurality of local optimal models will be obtained. The best modelamong such local optimal models can be regarded as globe optimal model, where thedifficulty of local optimal solution can be resolved effectively.How to perform single level nonlinear regression analysis of unfixed mode? Thepaper adopts normal meaning method of best regression subset to solve theoreticlimitation of ordinary variable selection methods. Although many large statisticalanalysis software packages provide best regression subset method, they only presentthe values of model fit statistic in every case of explanatory variables' combination,where the outcome of parameter estimate and testing hypotheses is not provided, notto say an optimal model. Which means that such software packages merely presentbest subsets with different number of explanatory variables, where the appropriatenumber has to be decided by users. In this paper models are constructed and analysisperformed in every case of explanatory variables' combination to provide an optimalmodel based on simplification and goodness of fit (GOF).How to realize multilevel nonlinear regression analysis of unfixed mode?Generally speaking, the common statistical software packages do not provide methodof effect items selection. Hence, adjustment of model has to depend on artificialmethod by users, which is very inconvenient to manipulate. The paper adopts normalmeaning method of best regression subset to select optimal model. First, fixed effectsand random effects are combined completely. Second, multilevel nonlinear regression analysis is done in every combination situation. And finally, the best fit model isselected as globe optimal model. In addition, for some types of data considerationshould be given to the situation where the precondition for the analysis methods to beused is not met.As for selection among several congener nonlinear models, the paper choosesrelevant GOF evaluation statistic according to the type of model and the method ofparameter estimate, compares fitting effect among models, and then selects the best fitmodel as optimal model.By virtue of programming language and corresponding procedures of SASsoftware, such methods described above can be actualized. For example, the completecombination of fixed effects can be produced using procedure REG and procedureLOGISTIC, and the complete combination of random effects can be produced usingprocedure FACTEX in multilevel modelling analysis.【Results】The paper attempts to address the problems and remedy thedrawbacks with the current analysis tactics, calculation methods and actualizationapproaches in the practical application of NRA, and propose tactics of optimal modelselection, following through the schemes by programming in SAS to achieve an easilyperceptible outcome. In detail, the results and major innovations of this paper aresummarized in four aspects.Part one is devoted to research on three kinds of nonlinear regression analysis offixed mode and exploration of an analysis tactic. Specifically, the estimates obtainedby linearisation method is regarded as starting value, to be followed by use of NLSmethod based on iterative algorithm to achieve a precise model. The tactic is markedby accurate, quick and efficient fitness, and has good feasibility and maneuverability.In addition, this part aims to solve the problem of intelligent selection among severalcongener nonlinear models while dealing with the same practical data, and programsome macros that can be used expediently.Part two explores four kinds of single level nonlinear regression analysis ofunfixed mode to address the theoretic limitation and application difficulty withcurrent variable selection methods, and to ensure that the established model is theoptimal one by adopting normal meaning method of best regression subset.Furthermore, this part actualizes the automatic comparison of fitting effect andintelligentized selection among several congener analysis methods while dealing withthe same practical data, which helps avoid user's blindness and uncertainty in method selection effectively.Part three researches four kinds of multilevel nonlinear regression analysis ofunfixed mode, forms entire combinations of effect items by adopting normal meaningmethod of best regression subset, and solves the problem of a lack of method of effectitems selection in multilevel modelling analysis at the present time, and provides theoptimal model automatically after comparison of fitting effect. Additionally, this partsolves the problem of automatic comparison of fitting effect and intelligent selectionamong several congener nonlinear models while dealing with the same practical data,and programs some macros that can be used expediently.Part four explores a selection and rectification tactic of estimation method inmultilevel nonlinear regression analysis based on SAS version9.2. If there is a smallnumber of random effects, procedure GLIMMIX can be used directly to obtainaccurate parameter values based on numerical integration methods, but the degree offreedom (DF) should be revised while performing hypothesis tests. If there is a largenumber of random effects, iteration of the foregoing means often fails to beconvergented, so procedure GLIMMIX can be used firstly to obtain approximateparameter values based on linearization-based methods, followed by the adoption ofprocedure NLMIXED to obtain accurate parameter values based on numericalintegration methods by taking the estimates obtained by linearization-based method asstarting values.Procedure GLIMMIX is promoted zealously by SAS, which is very handycompared with the complicated procedure NLMIXED. In spite of the efforts madetowards its development and perfection in recent years, it is not perfect. Thedeficiencies of procedure GLIMMIX are as follows: first, it doesn't performhypothesis tests to random effects; second, outcomes of hypothesis tests to fixedeffects are not correct. Tiny differences can be observed in estimate of parameters andtheir standard errors between procedure GLIMMIX and procedure NLMIXED usingnumerical integration methods, which is attributable to calculation precision. But theformer procedure has some serious defects in that it doesn't provide DF of randomeffects and provide wrong DF of fixed effects. Therefore, results of hypothesis testsobtained by procedure GLIMMIX cannot be adopted directly, and DF of randomeffects in hypothesis tests should be adjusted so as to obtain exact probability.The primary method of the tactic described above involves using procedureGLIMMIX and adjusting its hypothesis tests result. In some special cases, procedure NLMIXED is used as an adjunct. With the addition of programming, optimal modelcan be obtained directly or almost directly. The tactic can ease the workload and theuncertainty of artificial model selection. Compared with Wang's tactic, it substantiallyreduces workload and can fulfill the aim of attaining optimal model expediently byadopting normal meaning method of best subset.【Conclusions】The paper takes NRA as major research content, proposes somesolutions to many pivotal problems and attempts to address the drawbacks in practicalapplication, achieving some desirable results.An efficient analysis tactic has been proposed for NRA of fixed mode, and it hasproved effective in dealing with three kinds of NRA of fixed mode. Not only canvalue of parameter be obtained swiftly, but also fitting effect of the model obtained bythis tactic is better than the one obtained by traditional method. It can be used as aframe of reference for other NRA of fixed mode.In performing NRA of unfixed mode, optimal regression model can be obtainedby adopting normal meaning method of best subset to select variables. The tactic isfree of the inherent limitations of ordinary variable selection methods and overcomesthe drawbacks of the so-called method of best subset in statistic software, supplyingan optimal regression model for users expediently and reliably.At present, there are many methods that can be applied to data analysis for thesame purpose, yet it is often unclear which is the best method for practical data. Thetactic introduced here that involving several congener NRA methods simultaneouslyto gain optimal model has been a bold and successful attempt. Taking intoconsideration the fact that the intelligent level of statistic software is very low, thisflexible and complicated calculation tactic can be fully actualized with the help ofadvanced development in SAS software. In brief, results obtained by this tactic aresuperior in interpreting the inherent law of data to those obtained by an arbitrarymethod.
Keywords/Search Tags:Nonlinear regression analysis, Method of best regression subset, Single level model, Multilevel model
PDF Full Text Request
Related items