Font Size: a A A

Random Missing Value Filling And Its Effect Research

Posted on:2019-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z T YinFull Text:PDF
GTID:2358330548955536Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Missing data is widely existed in various fields due to the influence of various factors,and it has gradually become a research hotspot for statisticians in recent years.Missing data can lead to the reduction of sample information,the reduction of inspection energy efficiency,and the increase in the complexity of statistical analysis.If the processing method is not appropriate,the final analysis result may still be biased and cannot fully utilize the information implied by the data.Therefore,how to correctly handle missing values has become a key issue.Secondly,the arrival of the era of big data has brought a huge increase in the volume of data and data.An excessively large dimension will not only reduce the efficiency of analysis,but also cause problems such as collinearity of variables.Factor analysis based on panel data is becoming more and more popular in financial and macroeconomic analysis as an effective tool for dimensionality reduction.It has become a common method for dealing with big data problems.Based on the research at home and abroad,this paper combines the ideas of missing value filling methods and the features of panel data commonly used in recent years,making full use of the collective effect of panel data,and the factor analysis theory,through the analysis of MICE,Copy Mean and missForest excavate its filling characteristics,and a padding method based on a panel data factor model is proposed.The article is divided into two parts: simulation research and empirical application.Simulation study: A panel data set was constructed by computer simulations,where the number of samples N was less than the number of times T.After the values were randomly deducted according to the set different missing ratios(5%,15%,25%),they were filled with MICE,Copy Mean,miss Forest and Factor Model.Calculate and compare the root mean square deviation(RMSD)obtained by applying each filling method.Repeat the above steps several times to calculate the average RMSD of the data.The results show that the factor model filling method is significantly better than MICE and Copy Mean,but there is still a certain distance from the missForest method.For different missing rates,except Rep Mean,the RMSD values of other filling methods increase with the increase of missing rate.Empirical application: A random missing panel data set was constructed from 84 serial monthly returns of company stocks listed on the Shanghai Stock Exchange for 75 months,and the above four filling methods were used to fill in the missing random values,which was consistent with the simulation test.in conclusion.Finally,this article summarizes the full text from two aspects: the universality of the existence of the missing value and the importance of the missing value filling method,and puts forward the follow-up research suggestions on the shortcomings of this article.
Keywords/Search Tags:Imputation, Panel Data, Factor Model, missForest, MICE, Copy Mean, RMSD
PDF Full Text Request
Related items