Font Size: a A A

Modeling Of Incomplete Data And Missing Values Imputations Based On Alternate Learning

Posted on:2021-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:J C SongFull Text:PDF
GTID:2428330611451359Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,we can obtain the information hidden behind the data from the massive data through the data mining technology,thus providing correct guidance for decision-making.However,the lack of data is an inevitable problem in many fields of daily life.High-quality data is the premise of high-quality data analysis.Hence,how to impute the missing values effectively has become a focus in recent years.As a popular imputation method,the regression imputation method establishes a regression model for incomplete data and then predicts missing values through the built regression model.However,the traditional regression imputation method uses a liner regression equation to fit all the data,and does not consider that there may be two or more regression relationships among the attributes.Moreover,the traditional approach deletes the incomplete samples or pre-imputes missing values for solving the problem of incomplete model input.A lot of useful information may be lost when removing the incomplete samples,and the pre-imputations of missing values make the quality of pre-imputed values have a direct impact on the model accuracy and imputation accuracy.Therefore,an incomplete data modeling method based on Takagi-Sugeno(TS)fuzzy model is proposed to impute the missing values.In this method,the input space is fuzzy divided,and a special linear regression model is established for each fuzzy subset.Then the global model is constructed by using the weighted sum of local linear models,so as to improve the model fineness on the basis of traditional regression modeling.Furthermore,stepwise regression algorithm is used to select the significant features of each fuzzy subset,thus further improving the model fineness.In view of the incompleteness of model input,missing values are regarded as variables,and a model solving strategy which makes the selection of input features,the model parameters and the imputation of missing values learned alternately is introduced.The imputation will be completed simultaneously with the completion of modeling.10 real datasets and 1 manual dataset are used for the experiments,and the experimental results show that each of fuzzy partition,feature selection and alternate learning can improve the fineness of the incomplete data model,thus enhancing the imputation performance.Finally,the proposed method is applied to the imputation of CFPS2016 dataset.
Keywords/Search Tags:Incomplete data, Imputations of missing values, TS fuzzy model, Feature selection, Alternate learning
PDF Full Text Request
Related items