Study On The Interpolation Method Of Time Series Missing Value

Posted on:2019-11-12

Degree:Master

Type:Thesis

Country:China

Candidate:W W Cheng

Full Text:PDF

GTID:2428330545969567

Subject:Control engineering

Abstract/Summary:

With the rapid development of the information age,a large number of data are used in the machine learning and data mining.Most of the algorithms and related models are constructed for complete data sets.However,in the real world,the missing of data exists in the process of data collection,sorting,transmission and storage.Because of the data missing phenomenon,there are many difficulties in data analysis and application.The traditional methods of missing value processing is simple deletion,mean or zero value substitution.These methods will bring two serious problems:1)reduce the available data set,especially in the case of high missing rate.2)It is easy to introduce bias to the data set,and the way of mean substitution for zero substitution reduces the variance of data set and distorts the distribution feature of data set.In order to solve the related problems,this paper designs a missing value processing algorithm based on the theory of sparse representation and the K-nearest value of neighbor and proves the superiority of the proposed algorithm in this paper.The main work completed in this paper includes the following points:(1)A new missing value interpolation algorithm based on sparse recovery is proposed by using the sparse representation theory,and the relevant verification experiments of a PM2.5 time series data are designed.The superiority of the proposed algorithm is proved by the analysis of the experimental results of the various interpolation algorithms under different missing rates.The influence of different parameters on the interpolation algorithm of missing values is studied.(2)Based on the research of multivariable data,a new missing value interpolation algorithm based on sparse principal component(SPCA)analysis and gray relation coefficient nearest neighbors imputation algorithm(GKNNI)is proposed on the basis of the theory of sparse principal component analysis and grey relation coffiecient K-nearest neighbors algorithm.(3)Using the SPCA+GKNNI algorithm proposed in this paper,the interpolation experiments are designed for two kinds of multivariable data,and the interpolation results of different interpolation algorithms are compared.It is proved that the proposed correlation interpolation algorithm can deal with the problem of data loss with multivariable data well,and compares the traditional KNN interpolation algorithm and the SVD and BPAC algorithms.There is a certain improvement in the interpolation accuracy.

Keywords/Search Tags:

Data Missing, Time Series, Sparse Representation, SPCA, GKNNI

Related items

1	Study On Water Quality Time Series Data Mining And Application Integration
2	Research On Key Technique Of Mixed Data Clustering Based On Sparse Representation
3	Research On Time Series Representation Based Retrieval And Classification
4	Research On Representation And Clustering Methods Based On Time-series Data
5	Multiple Imputation on Missing Values in Time Series Data
6	Research On Key Technologies Of Time Series Cleaning
7	Time Series Data Processing And Application System's Developing
8	Sparse Representation-based Human Motion Capture Data Analysis Methods
9	Research On Feature Representation And Classification Methods In Time Series Data Mining
10	Design For Time Series Imputation Scheme Based On Generative Model