Font Size: a A A

Imbalanced-type Incomplete Data And Missing Value Imputations Based On TS Modeling

Posted on:2022-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y D LuFull Text:PDF
GTID:2518306509484924Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays,science and technology are developing very quickly,and the various walks of life are inseparable from data collection,data recording,data analysis,and other operations.Therefore,huge amounts of data have been accumulated,and the missing values in the datasets are an inevitable problem.Missing values in the datasets may have significant information,which will affect the correctness of data mining.Therefore,it has become an increasingly important task to model incomplete data and missing values imputation.Regression imputation is a common imputation method by analyzing the regressive relationship between the existing values and the missing values in the datasets and establishing a regression model for incomplete data.Since the regressive relationship between sample characteristics is usually different in different clusters,this paper proposes a method for incomplete data modeling that relies on clusters based on Takagi-Sugeno(TS)fuzzy model.The precise regression model between attributes is established for incomplete data in the framework of the TS fuzzy model.The samples in majority clusters are usually divided into minority clusters in the process of modeling for imbalanced datasets.Considering that class imbalance often occurs in datasets,in the premise parameter identification part of TS modeling,a distance density(DS)algorithm based on a partial distance strategy is proposed given the distribution of data categories is imbalanced.Moreover,a membership reconstruction strategy is proposed on this basis.To further improve the fineness of the model,the RRelief F algorithm for each fuzzy subset is used to select the relevant features of model input in the consequence part of TS model.Focusing on the incomplete model input,this method adopts the iterative learning method and regards missing values as variables,so that missing values,model structure,and parameters can be learned dynamically until the iteration converges.The imputation will be completed with the accompaniment of the end of the iteration process.In this paper,the proposed method is used to model the imbalanced and incomplete data,and then complete missing values imputation.The experimental results on UCI and KEEL datasets show that,compared with the traditional regression imputation method,the proposed method can not only take into account the class imbalanced in data,but also make full use of the existing data in incomplete data,thus effectively improving the imputation accuracy.
Keywords/Search Tags:Incomplete Dataset, TS Fuzzy Model, Missing Value Imputation, Class Imbalanced, Feature Selection
PDF Full Text Request
Related items