Font Size: a A A

Associative Classifier For Uncertain Data

Posted on:2012-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:X J TanFull Text:PDF
GTID:2218330344951315Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Associative classifiers are relatively easy for people to understand and often outperform decision tree learners on many classification problems. Existing associative classifiers only work with certain data. However, data uncertainty is ubiquitous in many real-world applications such as sensor network, location based service, market analysis and medical diagnosis. A number of factors contribute to the uncertainty, such as imprecision measurements, network latencies, data staling and decision errors. The uncertainty may render many conventional classifiers inapplicable to uncertain classification tasks. At present, to the best of our knowledge, there are very few works dedaticated to associative rules mining and associative classification for uncertain data. In this thesis, we devise a new associative classification algorithm mining associative rules on uncertain data to solve the associative classification of uncertain data.The main contents of this research include:(1) We introduce some basic concepts and define new measurements for mining associative rules from uncertain data. In this work, we introduce Possible World Model proposed by researchers from the uncertain data management community into associative rules mining fromuncertain data. Based on the definition of the expected support proposed by Chui et al., we define the expected support and confidence for associative rules of uncertain data. To deal with data uncertainty, we provide an approach to calculate the weight of an uncertain instance covered by the associative rules, and utilize multiple rules to classify uncertain data. The definition of this covered weight can guarantee that each instance could be matched by at least one rule in uCBA-Rule Generator algorithm. Meanwhile, in the classification phase, with this definition, we could ensure that uCBA algorithm is capbable of searching for multiple matching rules and combining these rules to predict the class label of the unknown instance, which could boost the classification performance. Futhermore, the covered weight could be used to limit the number of associative rules that utilize to predict the class label of the test instance, which could reduce the negative influence of those insignificant rules on classification performance.(2) We also extend the definition of the pessimistic error rate (PER) in C4.5 algorithm, and make use of its redefinition to prune rules with weak prediction on the training dataset. Experimental results show that this extension to PER can effecitively prune the insignificant rules with weak prediction on the training dataset, which greatly reduce the number of associative rules and boost the efficiency of construction and performance of uCBA classifier.(3) In this thesis, based on U-Apriori algorithm and CBA algorithm, we propose an associative classifier for uncertain data, uCBA (uncertain Classification Based on Association), which can classify data with and/or without uncertainty; in order to make full use of the uncertain information, we combine multiple associative rules to predict the class label of the future unknown instance, resulting in uCBA-Multi algorithm.Experimental results on 21 datasets from UCI Repository demonstrate that the proposed algorithm yields good performance and has satisfactory performance even on highly uncertain data. Meanwhile, uCBA-Multi algorithm can significantly boost the performance compared with uCBA-Single algorithm. Overall, uCBA classifier has stable performance; in particular, uCBA-Multi algorithm is more robust to the uncertainty of data.Some basic concepts and measurements presented in this thesis, such as the expected support and confidence of associative rules for uncertain data, the weight of the uncertain instance covered by a matching rule, pruning stragey for associative rules of uncertain data, multiple rules classification, can provide some insight into relative research about associative rules mining and associative classification for uncertain data.
Keywords/Search Tags:uncertain data, associative rule mining, associative classification, multiple rules classification, pessimistic error rate
PDF Full Text Request
Related items