Font Size: a A A

Algorithms Of Probabilistic Frequent Itemsets From Uncertain Data

Posted on:2014-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ZhangFull Text:PDF
GTID:2298330434966150Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The mining of frequent patterns is one of the most important research areas of data mining. With the improvement of software and hardware, traditional data may be missing and noisy, hence there will be a lot of uncertain data, such as in sensor network, satellite and diagnosis data of patients in hospital etc. Traditional mining methods of frequent patterns are not available due to the uncertainty of data.In this paper, I take the frequent pattern mining algorithm of uncertain data as our research object, and i summarize two different uncertain data model, one is based on expected support while the other is based on distribution of probability. However, expected support can’t show how exactly the estimate is, such that it loses the information of the support distribution. So we use probabilistic frequent patterns in this paper, also, we propose mining frequent patterns from uncertain data in a vertical way.The contributions of this paper include the following:(?) We make research and analysis on several typical mining algorithm from uncertain data, and conclude the general approach of these algorithms.(?) Because of the uncertainty of the support of the itemsets, method using expected support can’t express how accurate the estimate is, thus we use the confidence to express it. The model based on confidence is proved to be more accurate and complete.(?) We put forward another kind of frequent pattern mining algorithm of uncertain data based on extension in a vertical way. It extends the tidset as well as the subset search tree, mines frequent pattern depth first. In addition, it extends dynamic computation scheme for frequentness probability computation to mine frequent patterns given the user defined confidence and minimum support.(?) The experiment is based on three kinds of datasets, which are Chess, Mushroom, and T10I4D100K. It compares the experiment results of PFIM algorithm and UPC-Eclat algorithm with different minimum support and minimum confidence. Our UPC-Eclat algorithm is proved to be more efficient and effective than PFIM algorithm through the experiment.
Keywords/Search Tags:uncertain data, data mining, probabilistic frequent pattern, verticalmining, confidence
PDF Full Text Request
Related items