Font Size: a A A

Research And Implementation Of Multi-dimensional Association Rules Based On Prefix Tree

Posted on:2012-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:L P SuFull Text:PDF
GTID:2178330335974324Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Association rule mining is an important research branch of data mining. It can effectively solve the "data rich, knowledge poor" status with the main means of extracting knowledge from large databases. The main purpose of association rule mining is to find the relationship between the item sets in large databases, through the association relationship to guide decisions and actions.Currently, achievements of association rule mining are quite rich and the research of association rule mining has become more and more widespread. The research center expands gradually from single-dimensional association rule mining to multi-dimensional association rule mining. How to improve the efficiency of mining algorithms on time and space are always the core issues. In this paper, it has done a lot of theoretical research on these issues, and has done some experiments to verify these algorithms.This article mainly includes the following researches:1. First of all, it introduces the concepts of association rule mining, basic framework, related technologies and tasks. Then, it focuses on introduce ding several classical algorithm of association rules (including Apriori algorithm, DHP algorithm and FP-Growth algorithm), describing the principle and the process of frequent item set's production of these algorithm, analyzing the advantage and disadvantage of these algorithm. Otherwise, the paper introduces the concrete steps and correlation techniques of the algorithm based on data cube, and analysis the characteristics and fitness of the algorithm.2. The paper proposes algorithm based on the prefix storage. This algorithm is combined with Apriori, DHP and FP-Growths, using data compression technology and the data structure of prefix tree. Its principle is that the items with the same set of prefixes to a subset of a compressed node, the items in the node shared the same prefix set. Thus, the memory space using to store the temporary data greatly reduced, and candidate sets have generated from the node prefix and suffix directly, and have omitted the time of frequent item sets'connection. Otherwise, a head table link has added into the PR-tree. It can improve the search speed. In addition, not only it can generates frequent item sets of single-dimensional association rules for large number database with an acceptable time, but also can generates frequent predicate sets of multi-dimensional association rules by steps. In mining the frequent 2-itemsets, it used an idea which is similar to the hash function to generate candidate 2-itemsets from the database directly.3. The article has some experiments on millions of records to demonstrate the algorithm based on theoretical studies. Through analyzing experimental results, it confirmed the feasibility of the algorithm. And the paper has made a supplement to the algorithm on its insufficiency, providing a direction for the later period research.
Keywords/Search Tags:Multi-dimensional association rule, PR-tree alogorithm, Data cube, frequent predicate set, frequent item set
PDF Full Text Request
Related items