Font Size: a A A

Comparison Of Several Methods For Generating Directed Acyclic Graph By Variable Selection

Posted on:2021-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:T CaoFull Text:PDF
GTID:2480306248984459Subject:Statistics
Abstract/Summary:PDF Full Text Request
When dealing with high-dimensional regression problems,the traditional least squares method is prone to overfitting.At this time,variable selection is required to remove variables with lower importance,thereby simplifying the model and preventing overfitting.From the end of the last century to the 1990s,variable selection methods based on Lasso were im-plemented by regularization,such as SCAD,adaptive Lasso,MCP,and other methods have emerged in succession,which can simultaneously achieve parameter estimation and variable selection.The directed acyclic graph can clearly reflect the relationship between the variables to a certain extent.We can use one of the variables as the response variable and the remaining variables as covariates to perform multiple regressions to estimate the B ayesian network and discover the relationship between the variables.Therefore,DAG can be estimated through several variable selection methods of Lasso,SCAD,adaptive Lasso,and MCP.In 2016,Han et al.proposed a two-stage method based on adaptive Lasso.After using the adaptive Lasso method to generate the network in the first stage,the second stage uses a discrete improved search algorithm with a tabu list to further compress the results of the first stage to produce the final network.After the final network is generated,it is necessary to consider the recall rate and precision rate to evaluate the network according to the actual situation.The ad-vantages and disadvantages of each network generation method are different under different evaluation standards.This article attempts to use the SCAD and MCP methods in the first stage instead of adaptive Lasso,resulting in a two-stage method of SCAD and MCP mod-ification.Through multiple simulations,various types of variables are controlled,and the network generation effects of several one-stage and two-stage methods are compared.We found that the one-stage method is generally better in the case of small samples,and the network generation effect of the MCP method is the best;the two-stage method is generally better in the case of large samples,and the network generation effect of the MCP modified two-stage method is the best.It can be seen that the two-stage method modified by MCP has a certain improvement compared to the two-stage method based on adaptive Lasso under large samples.Afterward we comprehensively analyze the performance of various methods under different evaluation standards,and explore the advantages and disadvantages of each method in different aspects through the specific numerical values of each index in each sim-ulation.Next we compare the time of each method and analyze the applicable situation of each method.Finally,we illustrate the application of DAG in production and life through two practical cases.
Keywords/Search Tags:Variable selection, DAG, Lasso, Adaptive Lasso, SCAD, MCP
PDF Full Text Request
Related items