Font Size: a A A

Machine Learning Aglorithm With Missing Data

Posted on:2017-08-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:H GaoFull Text:PDF
GTID:1368330569998500Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Missing data is a common phenomenon in medical studies,environmental health surveillance,social science researches and etc.As a commonly used tool in these area,machine learning algorithm with missing data become a pervasive problem.Traditionally,the input data of the most machine learning algorithms is assumed to be complete.People often preprocess missing data by marginalization or imputation methods.However,it changes the original data,which may results in information loss and introduction of error.This can be harmful for subsequent machine learning algorithm.Recently,people considers how to deal with missing data without preprocessing.However,related works are not accurate enough.Aiming at three kinds of popular machine learning algorithms—extreme learning machine,margin-based feature selection and multi view clustering,this paper modifies them for handling missing data without preprocessing.Extreme Learning Machine is a popular learning methods for neural network,due to the characteristic of fixing input layer during learning,it cannot learn from missing data directly.Recently,there are some research works about ELM-based learning without missing data preprocessing.The representatives are V-ELMI algorithm,NR-SVM algorithm,and A-ELM algorithm.However,these algorithms is not accurate enough.In the third chapter,we propose a sample based extreme learning machine principle.It measures training error for each training samples in their own relevant subspace.Based on this principle,we devise S-ELM classification optimization,S-ELMK classification optimization and S-ELMR regression optimization.Fix point methods are used in the solution of these optimizations.As shown in the experiment,the proposed algorithm are superior to comparison methods in learning accuracy.Margin-based feature selection is a classical filter feature selection method.Features are evaluated and ranked by the hypothesis margin they induce,thereby feature selection is done.In the learning process,margin is calculated in the unified feature space.Therefore MFS cannot directly handle missing data in learning.Most existing methods are preprocessing.Recently,Lou proposes SID algorithm.SID does feature selection without missing data preprocessing.The basic idea of SID is the application of uncertain margin for missing data.However,SID do not solve the problem of distance calculation with missing data.In chapter 4,we propose KMFS algorithm.KMFS calculate the distance expectation for missing data and use k nearest neighbor strategy instead of the only one nearest neighbor in classical MFS algorithms.As shown in experiments,KMFS achieve more accurate feature selection than imputation methods and SID algorithm.Data often comes from multiple views or sources or modalities.Multi-view learning becomes a popular method.Traditionally,views of data are assumed to be complete.In reality,data may only be seen from some part of views and each view can be incomplete.Recently,people focus on this problem.There are some representative research works such as KL,Co KL and PVC algorithms.However,they have deficiencies.KL requires there is at least one complete view.Co KL and PVC is designed for pairwise incomplete views clustering.In the chapter 5,we propose IVSC algorithm.It can handles the problem that the number of incomplete views is arbitrary and all the views are incomplete.The basic idea of IVSC is that latent feature matrix of each view should be consistent to a center.As is shown in experiment,IVSC can do clustering in the situation that all views are incomplete and there are more than two views.Additionally,in the condition that there is one complete view,IVSC is more accurate than KL+KCCA.Besides,when the number of incomplete view is 2,IVSC is more accurate than Co KL+KCCA.
Keywords/Search Tags:Missing Data, Extreme Learning Machine, Feature selection, Multi-view clustering
PDF Full Text Request
Related items