Font Size: a A A

Improving Research On Association Rules Eclat Algorithm

Posted on:2011-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:P E ChenFull Text:PDF
GTID:2178360308458995Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Data mining is a non-trivial process which is to obtain effective, novel and potentially useful knowledge from the mass of data and ultimately form understandable patterns. Database systems of nowadays have been able to achieve query, statistics and reports, but the treatment were relatively homogeneous which are only simple digital processing for a certain range of data, and also cannot extract the inherent information hidden in the back of data. With the widely usage of Information Management System in various industry fields, it bring about the rapid expansion of the amount of data. Therefore, people are eager to have a function which can provide a higher level of data analysis, so that the decision-making and scientific work can be better supported. Association rule mining is the application of associated analysis in data mining, which is a very important subject with high theoretical value and extensive application.Association rules reflect the interdependence and the correlation between one thing and the others. If there are some correlations among two or more things, then we can predict one of the things through other things. Association rule mining algorithm is used for discovering association rules, and much research was carried out by many researchers and scholars. The existed association rules algorithms are mostly iterative methods based on Apriori and Fp-growth algoritms. Generally, database can be indicated by horizontal and vertical data layouts. This paper presents an in-depth analysis of data mining with frequent itemsets, describes the existing association rule classification and mining algorithms, and focuses on analyzing conventional Apriori and AprioriTid algorithms and points out their respective strengths and weaknesses. For the performance of mining algorithm, vertical data representation algorithm is usually higher than the horizontal of data representation, so Eclat algorithm is a classical association mining algorithm that the first use of vertical data representation.This paper is carried out Eclat algorithm in-depth research and analysis firstly, then presents an improved algorithm on this basis, which is refered to hEclat in the text. hEclat combines Hash table and Boolean Matrix into Hash-Boolean Matrix to improve the traditionaléclat algorithm in finding two Tidsets set's intersection, so that the operation of intersecting Tidsets can be faster, and eventually improves entirelyéclat algorithm's efficiency of generating frequent itemset and mining association rules.For the purposes of the association rule mining, previous studies have mainly focused on dealing with the time efficiency of algorithms while neglecting the multi-dimensional properties of association rules. However, extracting multi-dimensional association rules that more customer interests from the relational database, it will be more instructive and better to meet the needs of the actual situation in business decision-making. In this paper, we put forward the MD-Eclat algorithm based on conventional Eclat algorithm, and also formulate a novel approach of data preprocessing which helps to realize multi-dimensional association mining from plain tables or views in relational database. Because the algorithm uses a vertical data representation of the structural features, so the users needn't scan the database many times, either form the tree structure model frequently, only need to scan the database one time, therefore the improved algorithm better than similar algorithms in the efficiency of execution time.
Keywords/Search Tags:Association Rules, Vertical Data Layout, Hashing, Boolean Matrix, multi-Dimension Association Rule
PDF Full Text Request
Related items