Font Size: a A A

Variable Screening Methods Based On Conditional Mutual Information

Posted on:2022-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:L FanFull Text:PDF
GTID:2518306764968479Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
In the era of big data,data resources are indispensable materials for analyzing various fields.However,with the evolution of information collection methods,the dimension and quantity of data resources are also gradually increasing,resulting in large sample data and high-dimensional data.For high-dimensional data,the increase of data dimension makes the calculation more difficult.How to effectively analyze high-dimensional data is a very important problem faced by researchers.An effective way to solve this problem is variable screening,which reduces the high-dimensional data to an appropriate dimension,so as to effectively reduces the burden for subsequent analysis.The first work of the thesis is to apply conditional mutual information to variable screening.Conditional mutual information is used to measure the relationship between variables,the estimation method used has no restriction on data distribution and is simple to calculate.The results of independence test show that conditional mutual information can better control the empirical size,and its empirical power is also very nice.All these show that conditional mutual information has the ability to sensitively capture the correlation between variables,and we are able to carry out subsequent variable screening procedures based on conditional mutual information.The second work of the thesis is to propose forward variable screening and ensemble variable screening based on conditional mutual information.Numerical simulation results show that they perform well in various situations,and are able to give better consideration to the speed and accuracy of variable screening.In addition,they can be well applied to variable screening of gene microarray data and news text data.The third work of the thesis is to propose a weighted ensemble variable screening method based on conditional mutual information.Through numerical simulation and experiments on real data sets,we find that this method has a good performance in variable screening,and its screening speed and accuracy are better than the ensemble variable screening and forward variable screening in many situations.
Keywords/Search Tags:Conditional Mutual Information, High-dimensional Data, Variable Screening
PDF Full Text Request
Related items