Font Size: a A A

Research On Strategy Of Repairing Missing Data Based On Active Learning

Posted on:2015-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:H YangFull Text:PDF
GTID:2298330422977942Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The integrity of data makes a great contribution to the expression and storageof information as data is the vector of information. Data missing always occursrandomly in the process of data collection and collation, at the same time, datasetexpansion along with information era changes the realization of repairing missingdata into more and more complex. This paper presents a new idea in which whetherthe missing data need to be repaired or not is determined by the posterioriverification of the repairing effect based on active learning theory. Even more, weintroduce the existing data repair technique in a classification row before weemphasize on these difficulties occurred in this study such as the vague correlationamong attributes and the scope of the dataset.Active learning recognizes these valuable samples iteratively and adds theminto the training set in order to generate a superior classifier or learning machine asactive learning acquires knowledge from dataset with unlabeled sample or lessnumber of labeled samples. Active learning for repairing missing data method willjudge the importance of the missing values and the correlation among attributesbefore repairs these missing data in case that dirty and noise data will be added intodataset when we repair the missing data blindly. Missing data repairing method paysmore attention to the information expression of the data than the application ofstatistics in data repairing process.Attribute correlation of dataset is defined by the degree of association, fuzzyboundary function and other rough set theory, as a result, the attribute reductionalgorithm called Cut Of Attribute is designed to move independent and redundancyattribute out to optimize the dimensionality of the attribute correlation. Activelearning method is iterated to produce a precise multi-parameter regression modelwhich will be used in the process of estimating the missing data by the correlationfunction to generate the temporarily complete dataset. The multi-classificationmodel based on support vector machine is constructed to divide the temporarily complete dataset into several surrounded distributions with advantages of efficiencyincreasing and dataset flat. The multi-parameter regression model fitting is reused inthese surrounded distributions to verify and correct the effectiveness of the repair toexport the final and complete dataset.As usual, some UCI datasets are used in the simulation test to verify theeffectiveness and efficiency of this repairing method by comparing root mean squareerror and the other factor analysis between different repairing techniques.Furthermore, this method who owns both theoretical and practical value can widelyapplied on monitoring, predicting and classification domains.
Keywords/Search Tags:Active Learning, Cut of Attribute, Multi-parameter Regression Fitting, Repair Missing Data, MC-Model
PDF Full Text Request
Related items