Font Size: a A A

The Graph-based Semi-supervised Learning With Missing Data

Posted on:2012-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2178330335995085Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Missing data handling is often encounted in data analysis and machine learning,the usual practice is first to impute the data,such as mean imputation, KNN imputation, hot deck imputation, cold deck imputation, regression imputation,multiple imputation,then modeling in the completed data.However,imputaton is time-consuming and sometimes inappropriate imputation may cause large errors or false results,thereby affecting the subsequent analysis of the model.In this paper,we study the methods of treating missing data for classification,the aim is to constructing a classification model without imputation.We firstly combine Graph-based semi-supervised learning with missing data and construct a Graph-based semi-supervised learning model which can handle missing data automatically by constructing similar weights in missing data.Then,we realize our algrithom by R. Finally, I perform some exeriments in UCI data(including Letters,Spam,Diabetes,Wine,Segment).The experiment conclusion as follows:1:To deal with missing data using claasical statistic imputation(stochastic imputation, mean imputation,median imputation)fistly,then compare with Graph-based semi-supervised learning after imputation.The experiment results show that our method is slightly better than classical methods.2: Compare with classical supervised learning model(where data have none missing value) ,the proposed method (where data is incomplete by remove some data artificially) has similar results ,indicating that our methods is reasonable,which is very convenient (needn't imputation)when data contaning missing value.3: Compare with traditional methods(impute the data firstly,then model on the complete data), The experiment results show that our method is bettter than traditional methods,And our method do not fill missing data ,has a comparative advantage.
Keywords/Search Tags:missing data, graph-based semi-supervised learning, imputation, similar weights
PDF Full Text Request
Related items