A Study On SVM Algorithm For Missing Data Processing

Posted on:2018-05-18

Degree:Master

Type:Thesis

Country:China

Candidate:M C Zhu

Full Text:PDF

GTID:2428330593451035

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Missing data problem often occurs in data analysis and data mining.Missing data in feature vectors is an important branch of missing data problems.Medical,social survey areas due to their own characteristics,the proportion of missing data is very high.Although these data are missing,but they still contain a lot of valuable information.How to solve the problem of missing data and extract information become a hot research in recent years.The most common method of solving the problem of missing data is imputation,which means that the missing values are filled with a specific value in the pretreatment stage.However,this method is only effective when dealing with low proportion of missing data,and only applies to data which belongs to the MCAR(Missing Completely at Random)or MAR(Missing at Random).In fact,there are a variety of reasons for missing problems,and there is almost no ideal state of MCAR.For different problems,if you do not consider the reasons for the emergence of missing data,it will only distort the original distribution of data or even misleading results.This paper focuses on the problem of missing data in medical and social survey data.After analyzing the reasons for the lack of such features,an improved support vector machine is proposed to deal with missing data.The main innovation is to define a new kernel function that can handle missing data and complete data.To avoid introducing errors,the kernel function take full use of observed data to obtain more information.The sample is re-represented by the distance between the sample and the other samples,rather than directly calculating the value of the missing data.We validate our method on 5 data sets from UCI.Compared with the traditional imputation methods,including class mean,EM,regression,KNN,WKNN imputation methods,the accuracy,F-score,Kappa statistics and recall are used to evaluate the performance.Experimental results show that our method achieve significant improvement in terms of classification results compared with common imputation methods,even when the proportion of missing data is high.We have made improvements to the method,using complete data in the process before the extreme distance computation.The experimental results show that the improved algorithm performs better in continuous data.

Keywords/Search Tags:

Missing data, SVM, Classification, Kernel founction

PDF Full Text Request

Related items

1	Researches On The Classification Of Imbalanced Data With Missing Values
2	Research On Improved Bayesian Methods For Replacing Missing Data
3	Studies On Missing Data Imputation
4	Imbalanced Binary Classification On Hospital Readmission Data With Missing Value
5	Learning Bayesian Networks In The Presence Of Missing Values Based On Kernel Independent Component Analysis
6	Research On Missing Value Imputation Method Based On Mixed Information System
7	Researches On Imputation And Classification Of Incomplete Data Based On Variables For Missing Values
8	Research On Imputing Algorithm Of Missing Values Based On Kernel Similarity And Low Rank Approximation
9	Nonparametric Imputation For Missing Data
10	A Study Of Kernel Classification Algorithm Based On Double-Kernel Combinition