Font Size: a A A

Incomplete Data Modeling And Missing Value Imputation Based On Confidence

Posted on:2022-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:L W MaoFull Text:PDF
GTID:2518306509995069Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of big data with explosive accumulation of data,the efficient mining of valuable information from massive data has become a meaningful research topic.Classification is a basic and important technology in data mining,which is widely used in biometric recognition,document classification,medical diagnosis and other fields.Missing data is a common defect that needs to be solved in classification task.However,real-world datasets usually have different degrees of data missing,which makes data analysis more difficult.Under the background,this paper discusses a multi-task learning model based on incomplete data classification assisted missing value imputation.In order to improve the performance of missing value imputation,this paper studies the imputation method of making full use of the valuable information hidden in incomplete samples.The method is based on self-associative neural network to build a multi-task learning model of attribute mutual association based on confidence.By optimizing the data transmission path of the nodes in the output layer,a multi-task learning architecture with the mainly imputation task and the secondary classification task parallel is constructed.At the same time,in order to improve the utilization of known observations in the dataset,incomplete samples are added to the training set to participate in model training.Besides,the missing rate of attributes in the sample is used as the initial confidence of the sample to adjust the influence degree of incomplete sample and complete sample as input on the model parameter optimization.In addition,the missing value is taken as unknown variable to participate in the process of model training on the basis of the two-stage imputation scheme of neural network,and the missing values are iteratively updated based on the optimization algorithm,which reduces the estimation bias caused by pre-imputation.Then the results of the classification task are used to update the confidence dynamically,so as to adjust the weight of the cost function to change the degree of model parameter optimization.The training and imputation of the model are carried out simultaneously,that is,the completion of training is accompanied by the completion of imputation.In this paper,the datasets in UCI and KEEL are selected to carry out experiments on the above two parts of the research content,and the results show that the proposed model and imputation scheme have high accuracy in incomplete data classification and missing value imputation.In the era of data quality is difficult to guarantee,the research on the classification and imputation method of incomplete data involved in this paper has practical significance.
Keywords/Search Tags:Confidence, Imputation of Missing Values, Neural Network, Incomplete Data, Classification
PDF Full Text Request
Related items