Font Size: a A A

Research On Imputing Algorithm Of Missing Values Based On Kernel Similarity And Low Rank Approximation

Posted on:2019-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:X F SunFull Text:PDF
GTID:2428330593451076Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The purpose of data mining is to find valuable rules or knowledge in complex data sets.This requires professionals to build reliable algorithm models to discover,and reliable algorithm models rely on high quality data.Missing values are common in all walks of life,and these missing values seriously affect the quality of the data.How to deal with missing values accurately and effectively is a hot topic in the field of data mining.Generally,the processing of missing values can be divided into two methods: deleting method and imputation method.Deleting method is straightforward,but virtually lost a lot of useful data,especially when the data set contains large amounts of missing data.Imputing is another way to deal with missing values.After a long time of development,there are relatively complete theories and technologies,and the types of imputation methods are constantly upgrading.The Imputation method is used to impute the attributes of the missing values by using the data set itself,which makes the whole data set complete and facilitates the establishment of the model.In recent years,low rank technology has been applied in image restoration and recommender system.The low rank matrix is used for modeling and the sample space is approximated by subspace.At last,the approximate solution of the original space can be obtained.This paper mainly studies the imputation algorithm of missing values based on the kernel similarity and low rank approximation and some improvements are made in the method and strategy of imputing.The main work is as follows:1.Imputing based on kernel similarity.The method is to find the K complete samples that are most similar to the sample contains missing values by the similarity of the kernel function.Finally,imputing the missing values with the weighted mean of the missing attributes corresponding to the K samples.2.A new low rank imputation method of missing values based on strong correlation.First,calculate the linear correlation between the sample containing the missing value and other samples,then set the Pearson linear correlation threshold a,select the samples whose linear correlation is greater than the threshold parameter a,then construct the low-rank imputation model and finally obtain the missing values.3.The application of the low rank matrix complete in the wind speed missing values.In this paper,we transform the time series wind speed data into a low rank matrix form.Through the experiment,the number of rows and columns of the wind speed matrix are determined,and the GROUSE optimization algorithm is used to find the optimal solution of the minimum kernel norm by the subspace updating strategy.So the missing wind speed values can be imputed.Experiments show that the imputation algorithm of missing values based on the kernel similarity has the best result under the same missing ratio.However,the low-rank imputation algorithm based on the linear correlation threshold performs better than the one-time padding over samples contain missing values.The short time wind speed data based on time series is transformed into the form of matrix,and then imputed by low rank approximation method,which is better than the traditional imputation methods.
Keywords/Search Tags:Missing values, Kernel similarity, Linear correlation, Low-rank imputation
PDF Full Text Request
Related items