Font Size: a A A

The Incremental Mining Algorithm Based On HFUFP-tree

Posted on:2011-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:S H ZhuFull Text:PDF
GTID:2178360305472698Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Data mining has been an important development direction of computer research area in recent years, which has been widely used in many fields, such as banking, telecommunications, insurance, retail, etc. Its huge business potential and the application value attract a large number of research units and enterprises engage in data mining technology research and development. New data mining algorithms and models have been published almost every year, on which the study is more and more extensively and deeply.Association rule mining is one of the most active research directions in data mining areas. Association rule has played an irreplaceable role in many business decisions making, such as classification design, analysis, cross-shopping, and cheap-sale analysis. Generally, current association rule mining algorithm can be divided into two categories static mining algorithm and dynamic mining algorithm. The study on static association rules mining algorithm is relatively deeper. Recent years, the dynamic association rule mining algorithms have been concerned increasingly.With the development of Internet technology and database technology, the quantity of data has been presented the form of rapid incremental growth, which leads to high additional costs and a significant reduction of mining efficiency in the process of association rule mining when it needs to re-scan the huge original database. In comparison with static association rule mining algorithm, dynamic incremental mining algorithm such as FUFP-tree, Pre-FUFP etc, to a certain extent, overcome the defects which belong to static mining algorithm, but in some cases, which still need to re-scan the huge original database. For example, the Pre-FUFP algorithm adjusts the FP-tree structure by dividing all the itemsets into four cases based on the frequency status in original and updated database. In most cases Pre-FUFP algorithm just needs to scan the new transaction of the database, but when a non-frequent item becomes a frequent one, which still needs to re-scan the entire database. Therefore, it's very essential to do research on the algorithm of dynamic incremental data mining in transaction database when the transaction of data is very frequent.Based on the deep analysis of Pre-FUFP mining algorithm, in which the problem exists is re-scanning of the entire database when a non-frequent item becomes a frequent one. Against the problem of re-scanning original database still in some cases, the research of this paper is to improve it. The major work and results achieved are as followed:(1) Briefly introduced some basic knowledge of association rules mining, specifically analysis a few typical mining algorithm of static and dynamic association rules mining algorithms, including the Apriori algorithm and its improved algorithm, FP-tree algorithm and Pre-FUFP algorithm. Described characteristics, steps, applicable background, advantages and disadvantages of each above algorithm, which highlights the Pre-FUFP algorithm that is a currently widely attractive incremental mining algorithm. Proposed the issue of the limitations of the algorithm.(2) Proposed an incremental mining algorithm, HFUFP-tree (Hierarchical Fast Updated FP-tree) algorithm, which was based on hierarchy architecture. The algorithm compressed all of the transactions in a HFUFP-tree architecture, and logically divided the HFUFP-tree architecture into three layers through the introduction of a high support threshold and a lower support threshold and MF (Mark-frequency) node and the MPF (Mark-Pre-frequency) node. When the database transaction occurs, the proposed algorithm can quickly mine associate rules without any re-scanning of the original database and any construction of the tree architecture, but only to dividing the itemsets into nine cases to improve the adjust the HFUFP-tree architecture respectively.(3) Analysis the performance both from accuracy and efficiency firstly. In aspect of accuracy, the two algorithms are completely same. In aspect of efficiency, the HFUFP-tree algorithm has a better performance on contrast of Pre-FUFP algorithm when the transaction of database occurs frequently. Finally, the results of experiments show that the two algorithms mine the completely same frequent items in each different threshold, but when the transaction of database occurs frequently, HFUFP-tree algorithm has significantly improved of the efficiency.
Keywords/Search Tags:data mining, dynamic mining, associate rules, HFUFP -tree
PDF Full Text Request
Related items