Font Size: a A A

Research On An Associative Classification Algorithm To Data With Uncertain Attribhutes

Posted on:2012-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhuFull Text:PDF
GTID:2178330335454631Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Since there are many real-life situations such as financial, telecommunications, and sensor networks in which a large number of uncertain data appears, people gradually find that ignoring data uncertainty is hasty and unreasonable. In recent years, uncertain data mining algorithms have gradually become a research hotspot. This paper introduces the sources, causes and common data model of the uncertain data in real-life situations, summarizes the existing data mining algorithm for uncertainty. Currently the research of this field is still in imitation of the classical traditional data mining algorithm. Because of the complexity of uncertain data, the algorithm's performance is not satisfactory. View of this, based on attribute level uncertainty data, we proposed a new sampling based algorithm for associative classification.The algorithm includes two phases:frequent itemsets mining and associative classification. The first phase is the most time consuming part which largely determines the efficiency of the algorithm. The second phase uses the frequent itemsets generated by the first phase to build classifier and predicts test dataset. The classifier determines the classification accuracy.In the first phase of the algorithm, the existing uncertain data mining frequent itemsets algorithms all use a large amount of runtime and memory consumption. To solve this problem, this paper introduces a new sampling based algorithm for mining frequent itemsets on uncertain data-SARMUT. The algorithm is based on the idea of frequent itemsets are frequent in many data items in the collection of data that have a certain repetitive, so we can extract some data to represent the overall to save time and memory consumption, instead of doing the complete mining. Based on specificity of uncertain data, we introduce the distance measure of similarity between data sets in the algorithm, and use a greedy algorithm to find the best sample set. Through a large number of experiments show that compared with non-sampling algorithm, SARMUT can achieve very high accuracy, while greatly reducing the run time consuming.In the second phase, we propose a new associative classification algorithm for uncertain data-uARCSR. The specificity of uncertain data makes the finding of class features more difficult, because of the increasing of the conflict rules. To solve this problem, we introduce two new classification rule-evaluation measures:strength and weighted relative accuracy, which can effectively distinguish between redundant and conflict rules in uncertain data classification. Because the rules and instances do not match exactly on uncertain data, a new matching algorithm is proposed for the rule pruning strategy. In the stage of predicting, we use the scores of all rule sets to choose a class label. Experimental results show that the algorithm can effectively classify uncertain data, achieve satisfactory classification accuracy in the five datasets, and can effectively reduce the rule set size.Finally, we combine the above two phase algorithms into a sampling-based associative classification algorithm. Comparing it with the non-sample classification algorithm, we verify that the sampling-based algorithm performs good classification accuracy, while greatly reducing the runtime.
Keywords/Search Tags:Uncertain Data, Frequent Itemsets Mining, Associative Classification, Data Mining
PDF Full Text Request
Related items