Imbalanced-type Incomplete Data And Missing Value Imputations Based On TS Modeling

Posted on:2022-02-22

Degree:Master

Type:Thesis

Country:China

Candidate:Y D Lu

Full Text:PDF

GTID:2518306509484924

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Nowadays,science and technology are developing very quickly,and the various walks of life are inseparable from data collection,data recording,data analysis,and other operations.Therefore,huge amounts of data have been accumulated,and the missing values in the datasets are an inevitable problem.Missing values in the datasets may have significant information,which will affect the correctness of data mining.Therefore,it has become an increasingly important task to model incomplete data and missing values imputation.Regression imputation is a common imputation method by analyzing the regressive relationship between the existing values and the missing values in the datasets and establishing a regression model for incomplete data.Since the regressive relationship between sample characteristics is usually different in different clusters,this paper proposes a method for incomplete data modeling that relies on clusters based on Takagi-Sugeno(TS)fuzzy model.The precise regression model between attributes is established for incomplete data in the framework of the TS fuzzy model.The samples in majority clusters are usually divided into minority clusters in the process of modeling for imbalanced datasets.Considering that class imbalance often occurs in datasets,in the premise parameter identification part of TS modeling,a distance density(DS)algorithm based on a partial distance strategy is proposed given the distribution of data categories is imbalanced.Moreover,a membership reconstruction strategy is proposed on this basis.To further improve the fineness of the model,the RRelief F algorithm for each fuzzy subset is used to select the relevant features of model input in the consequence part of TS model.Focusing on the incomplete model input,this method adopts the iterative learning method and regards missing values as variables,so that missing values,model structure,and parameters can be learned dynamically until the iteration converges.The imputation will be completed with the accompaniment of the end of the iteration process.In this paper,the proposed method is used to model the imbalanced and incomplete data,and then complete missing values imputation.The experimental results on UCI and KEEL datasets show that,compared with the traditional regression imputation method,the proposed method can not only take into account the class imbalanced in data,but also make full use of the existing data in incomplete data,thus effectively improving the imputation accuracy.

Keywords/Search Tags:

Incomplete Dataset, TS Fuzzy Model, Missing Value Imputation, Class Imbalanced, Feature Selection

PDF Full Text Request

Related items

1	Missing Value Imputation Based On TS Modeling With Alternate Learning
2	Attribute Correlation Modeling And Missing Value Imputation Of Incomplete Data Based On Fuzzy Partition
3	Attribute Associated Neuron Modeling And Missing Value Imputation Based On Neural Network
4	Research On Missing Value Imputation Of Incomplete Data
5	Incomplete Data Modeling And Missing Value Imputation Based On Confidence
6	Researches On Imputation And Classification Of Incomplete Data Based On Variables For Missing Values
7	Comparative Study On Imputation Methods Of Missing Data In XGBOOST Model Under Complete Random Missing Mechanism
8	Modeling Of Incomplete Data And Missing Values Imputations Based On Alternate Learning
9	Studies On Missing Data Imputation
10	Research On Bayesian Network Based Missing Value Imputation Model For Incomplete Credit Data