Font Size: a A A

Fast Solution Based On Boundary The Eps Algorithm

Posted on:2005-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:H L ZhaoFull Text:PDF
GTID:2208360125457466Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
EPs (Emerging Patterns) are itemsets whose supports change significantly from one dataset to another. They can capture multi-attribute distinction between two datasets in the database, and can be applied to classification and predication. Recently, a lot of classifiers, which based on EPs, have been proposed, such as CAEP, JEP-Classifier, DeEP, BCEP, CEEP, and so on. Relevant research shows that their accuracies are higher than the traditional classifiers. So, it's important to mine EPs.The efficient mining of EPs is a challenging problem, since (i) the Apriori property no longer holds for EPs, which is to say, the superset and subset of an EP might not be an EP, and (ii) there are usually too many candidates for high dimensional databases or for small support thresholds when mining EPs. Naive algorithms are too costly to solve the problem.Dong and Li apply the description of closed collection to the mining of EPs for the first time. They uses the borders of the closed collection of EPs to express the EPs, and manipulate the borders to mine EPs. The EPs-mining algorithm based on borders improves the efficiency of EPs' mining, which makes it feasible to mine EPs efficiently. However, existing border-based algorithms are still inefficient. Moreover, The results need to be described as the union of several borders, which is not as natural as a single border and is inefficient to enumerate EPs.In this paper, we give an improved algorithm of border's operation named FFBD( Filter First Border Differential, FFBD). To get the left border, we use the filter first and expand iteratively strategy. At the beginning of each iteration, we check the intermediate result of the former iteration. We only expand a part of them to get the candidates, and use the other part as the referential space to filter the non-minimal elements. So, the algorithm's efficiency is improved.Furthermore, we prove that the difference of any two close interval with the same left borders is close and can be presented as a pair of upper border and lower border. Based on this, we ulteriorly propose a new border differential algorithm called EUBBD(Expanded Upper Border Border-Differential algorithm), which can manipulate two regions with the same left border and return only a pair of borders.FFBD and EUBBD are all universal algorithms of differential algorithm of interval-closed collection. Based on them, we can build an efficient algorithm to mine the EPs of any growth-rate and threshold.
Keywords/Search Tags:data mining, max-pattern, emerging patterns, interval-closed collection, border, classification
PDF Full Text Request
Related items