Font Size: a A A

Research On Incomplete Data Classicifation Based On Multiple Classifiers

Posted on:2019-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:M SunFull Text:PDF
GTID:2428330566496875Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Classification is a fundamental problem in many fields,such as data mining,machine learning and pattern recognition.There are a great number of classification algorithms.However,they rarely focus on incomplete data..Incomplete data widely appears in many fields,including society science,computer vision and biological system.For example,many users are accustomed to partial response in survey,answering a share of questions instead of entire questions to protect individual privacy,which leads to incomplete data and many missing patterns.Missing values have negative impact on big data analysis.There are two ways to solve the problem,one is neglect,the other is missing value imputation.Neglect means ignoring the samples with missing values and using complete samples only.The method suffers loss of information and low classification accuracy.In absence of extra knowledge or weak correlation between feature attributes,exact imputation is hardly conducted for many data sets.Therefore,considering strong demand and faulty recent solutions,Incomplete data classification is a significant issue.For this issue,we proposed a novel idea: instead of imputing or neglecting,we perform classification on incomplete data sets directly.We use integrated learning,regard complete views from the incomplete dataset as a training set and generate the base classifiers.When tuples to classify appear,each base classifier offers a result and we obtain final classification decision with a reasonable combination method.This paper will focus on solving the research below:(1)Because of the large amount of incomplete views and their negative impacts on the effectiveness and efficiency of classification,this paper selects view sets to replace all complete views to increase the amount of views we use,which will increase time performance under the circumstances of small fluctuations of classification accuracy.This paper first finds out all the complete views,formally defines view selection problem and then solves the problem.Comparison experiments verify the effectiveness of algorithm.(2)Because different base classifiers have different impacts on the final classification decision,this paper proposes two reasonable weight distribution methods to distinguish the distinction between base classifiers.In the first method,we use two factors which apparently effect the classification performance,the amount of samples and correlation between feature attributes and category.The second method uses learning idea and complexly considers all the factors,avoiding the neglect of other factors and complex quantitative work.Experiments show that two methods are both better than voting.(3)The algorithms in first two parts are to solve the classification for complete tuples to classify,so we propose the algorithms for complete tuple to classify.Missing values in the tuples to classify lead to the decrease of available base classifiers.We introduce MAT(Missing Attribute Tree)structure to store training datasets,whi ch makes tuples fast to determine available complete views.In case of the decrease of classification accuracy due to the decrease of the amount of base classifiers,we use boosting to combine,iteratively training base classifier to obtain classification results.This algorithm is compared with the existing solutions,and the effectiveness of the algorithm is verified on the time performance and classification performance.
Keywords/Search Tags:Incomplete data, missing value, classification, multiple classifiers, boosting
PDF Full Text Request
Related items