Font Size: a A A

Research Of Missing Values Imputation Method Based On Quadrant Nearest Neighbors And DFT

Posted on:2017-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhangFull Text:PDF
GTID:2308330485968740Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The data of practical application often comes missing values, so it has become an important problem to resolve it. At present, it has become an essential research topic to impute the missing values by filling algorithms. The so-called missing value imputation is based on the complete dataset to infer and calculate the missing values by the algorithm. While the k-Nearest Neighbors(kNNI) is an popular method frequently adopted by researchers for its simple and easy to implement, its apparent drawback of causing bias decreases the imputation accuracy. An improved kNNI, namely QENNI, reduces such bias by selecting one but only one nearest neighbor of the target point from each generalized quadrant.DDWQ is a new algorithm which takes the impact of the overall dataset for the missing values and the nearest neighbors in each quadrant into account. The core idea of DDWQ algorithm is enclosing the missing value by shell, making the missing value item as the center of the quadrant, distributing the complete data into each quadrant, using the hybrid weight of the nearest neighbors and density of each quadrant to impute the missing values. It can not only avoid the nearest neighbors choosing bias of kNNI, but also take the effect of the whole dataset into consideration. Experimental results show that the DDWQ algorithm has a better filling accuracy than QENNI.Time series data exists widely in the reality. In this paper, the similarity and periodicity of time series based on DFT are studied and the DFT distance is used for the density clustering method to do analyze. On the basis of this, a hybrid time series missing value imputation algorithm, namely STSC, is proposed. The core of this algorithm is making the same class time series and the sub-series of its each cycle as the inputs of DDWQ to fill the missing values. Tests on the simulated dataset show that the accuracy of STSC is improved.
Keywords/Search Tags:Missing Value Imputation, Time Series, Quadrant Neighbors, kNNI, DFT
PDF Full Text Request
Related items