Font Size: a A A

Research On Classification Methods Of Incomplete Data

Posted on:2018-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:L L ShenFull Text:PDF
GTID:2348330542983658Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of big data,data mining for massive data is widely concerned and applied in all walks of life.The data not only show explosive growth in quantity,but also become increasingly complicated in content and form.Data lost is inevitable due to subjective or objective factors.The loss of data leads to the challenge of the existing mining algorithm.In order to improve the processing ability of C4.5 algorithm for incomplete data,to propose an algorithm which replace incomplete data with the high frequency of attribute value for C4.5 algorithm(C4.5-RHFAV)aim at the C4.5 algorithm will ignore missing values at the feature selection process,to solve the problem which lost information contained by incomplete data,the experimental results show that the improved C4.5-RHFAV algorithm achieves the predetermined goal well,and get higher classification accuracy than the traditional C4.5 algorithm.In view of the C4.5 algorithm will ignore missing values and the low sampling efficiency of C4.5-RHFAV in replacing missing values,to propose an algorithm which replace weighted attribute values for C4.5 algorithm(C4.5-RWAV)in feature selection,to solve the problem of lost contains information by incomplete data and cannot apply processing C4.5-RHFAV data sets,the experimental results show that C4.5-RWAV is very good to achieve the intended target,has the higher classification accuracy than the traditional C4.5 algorithm and get the classification accuracy better than C4.5-RHFAV.By comparing the results of data cleaning methods and algorithm extension methods,it shows that even under the same cleaning rules,the Algorithm extension method has better classification accuracy.Based on the experimental results show that it is reasonable when the C4.5 algorithm deal with incomplete data for feature selection in the data subset,can get higher classification accuracy,but this method can get the time complexity higher than the data cleaning method.
Keywords/Search Tags:Incomplete data, Classification algorithm, Decision tree, C4.5 algorithm
PDF Full Text Request
Related items