Font Size: a A A

A Preprocessing And Filling Algorithm For Incomplete High-Dimensional Data

Posted on:2018-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:H J SunFull Text:PDF
GTID:2428330596454766Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data imputation is the process of interpolating missing values in a dataset.The existing imputation algorithm has problems such as high time complexity,poor accuracy and low robustness,and the research framework is incomplete.In order to solve these problems,this topic use a complete data imputation model,it's tart from data noise reduction to the normalization process and then to data dimension reduction processing.Finally achieved a more excellent imputation effect.In this topic,discrete wavelet transform is used to denoise the high-dimensional datasets.The traditional discrete wavelet transform method is often traverse the way to choose wavelet basis function,and it is not suitable for high-dimensional datasets,will easily lead to dimensionality of disaster problems.After taking into account the characteristics of high-dimensional data,a wavelet basis function selection method based on random sampling is proposed.Experiments show that the method achieves a balance between computational efficiency and noise reduction effects.Traditional data normalization methods often require the maximum,minimum,or average of the data set,and the maximum,minimum,or average must be recalculated when new data is added,Resulting in a lot of redundancy calculations.For the characteristics of high-dimensional data,this topic proposes a new normalized exponential function method to improve the efficiency of data normalization and When new data added,it will not cause repeated calculations.In this topic,we use the group intelligence optimization algorithm to reduce dimension of high-dimensional datasets.According to the characteristics of high dimension data dimension and variable data characteristics,we choose the bird mating optimizer algorithm(BMO),Because BMO algorithm has the idea of grouping iteration,it is possible to adjust the ratio of different groups according to the characteristics of adjusting high-dimensional data to achieve better noise reduction effect.In this thesis,two improvements are proposed to solve the problem of BMO algorithm.First,the parameter adaptive mechanism is introduced,which makes the algorithm adjust the algorithm parameters in real time according to the characteristics of the data set and the different periods of the algorithm iteration.The second is to combine the simulated annealing algorithm with the BMO algorithm to avoid the premature problem of the algorithm.An adaptive simulated annealing BMO algorithm(SABMO)is proposed.Experiments show that the SABMO algorithm has a better effect on the dimensionality reduction of high dimensional datasets.In this topic,the SABMO algorithm is used to optimize the weights and thresholds of neural network training.The SABMO-NN imputation model is proposed.However,this model is a static model that does not change the weights and thresholds in the application phase.After a long time,the data set may have a small amount of offset,resulting in a larger prediction error,and need to retrain this model.Aiming at the problems existing in the SABMO-NN imputation model,an improvement based on the feedback correction mechanism is proposed.Experiments show that the improved SABMO-NN imputation model has better imputation accuracy.
Keywords/Search Tags:High-dimensional data imputation, wavelet threshold denoising, adaptive simulated annealing bird matching algorithm, neural network, feedback calibration mechanism
PDF Full Text Request
Related items