Font Size: a A A

Research On Missing Data Interpolation Method Based On Variational Bayesian

Posted on:2022-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:H Y XuFull Text:PDF
GTID:2518306782455234Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Mc Kinsey says: "Data has penetrated every industry and business function today to become an important factor in production." Whether the data information is complete and sufficient is directly related to the development of the industry and business.Reliable,accurate and complete data can provide accurate,timely and systematic statistical analysis and decision-making.On the contrary,incomplete or missing data will reduce the accuracy of statistical analysis and decision-making,affect the industry and industrial development,and even cause huge economic and social losses.However,in production practice,due to some subjective and objective reasons,some data will inevitably be lost,affecting the quality of data.To deal with the problem of missing data,it is easiest to remove or not to process the data directly,but it will cause the loss of data information or make modeling more difficult.Therefore,it is very important to predict and interpolate the missing values scientifically and effectively.Bayesian statistics is a posteriori inference process that regards any unknown parameter as a random variable,uses a probability distribution to describe the unknown parameter,and then uses known data and prior information in statistical analysis to obtain the unknown quantity.However,in the Bayesian model,it is difficult to solve the posterior distribution of data.Variational reasoning is a common method for seeking approximate posterior distribution.The method transforms the posterior inference problem into an optimization problem for solving,which has good convergence and expansibility and is suitable for solving large-scale approximate inference problems.Variational reasoning seeks an arbitrary distribution that approximates the replacement of a posterior distribution by minimizing kullback-Leibler Divergence and Evidence Lower Bound.In this paper,the data set containing missing data is taken as the research object,and the posterior distribution of Bayesian model is obtained by using variational reasoning,so as to obtain the interpolation and inference of missing data.The main work of this paper includes:(1)With data missing interpolation as the research object,a series of application processes of variational reasoning methods such as mean field variational reasoning,expected propagation variational reasoning,mixed variational reasoning,collapse variational decibel Bayesian reasoning and random variational reasoning in the posterior distribution of approximate inference Bayesian models are analyzed.(2)Perform interpolation analysis on Bayesian Gaussian mixture model data containing missing data,propose variable decibel Bayesian interpolation method,and conduct experimental comparative analysis based on simulated data and actual data set of life expectancy at birth in some African countries with different missing ratios.The results show that: On the premise that other control variables are the same,the interpolation effect of low proportion deletion is obviously better than that of high proportion deletion,and the overall success rate and interpolation accuracy of variable decibel Bayesian interpolation are better than those of nearest neighbor interpolation and mean interpolation under different proportion of deletion.(3)A semi-supervised regression model based on variational sparse Bayes is proposed.Based on the "real estate valuation" data set,partially missing data of a variable is artificially constructed by using semi-supervised learning interpolation,and a comparative empirical analysis is conducted on the variational sparse Bayesian regression model before and after interpolation.The results show that the interpolated variables with missing data can achieve about 70% interpolation accuracy,and the interpolated data set can still achieve almost the same regression effect as the original data set,which effectively verifies the effectiveness of the model in dealing with incomplete data sets.
Keywords/Search Tags:Missing data, Interpolation, Variational inference, Sparse Bayesian, Semi-supervised learning
PDF Full Text Request
Related items