Font Size: a A A

Study On Associations Rules's Apriori Algorithm In Data Mining

Posted on:2008-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:S Q MaFull Text:PDF
GTID:2178360242460286Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the nowadays information age, life speed speeding up, causes the people more and more to becoming an information based society, the digitized development,The advent of the computer has brought with it the ability to generate and store huge amounts of data. the volume of data available is almost incomprehensible. The problem is in how to turn this data into usable information.Data Mining is the process of extracting knowledge hidden from large volumes of raw data.Firstly,the thesis introduces data mining technology and system.It generally summarizes the background and development situation of data mining technology and system as well as the status of data mining system in domestic and foreign.It also analyses the problem of data mining system facing and development trends in the future.Secondly,this thesis studies data mining standards.It focuses on researching and designing the architecture of data mining system based on PMML standard.On aspects of processing standard,model definition standard.On aspects of processing standard,model definition standard,wed standard,standard API and grid service standard etc.,this thesis discusses standards and categories of data mining.Owing to the urgent demand of developing data mining platform,PMML standard was applied to the system firstly,and becomes most popular model management standard.Again,designs and implements data mining platform based on PMML standard——DBIN Miner.Under the guidance of architecture of data mining platform based on multi-standards,the platform was divided into three function parts that is GUI of upper level,storage management module of middle level.According to the norm of the CRISP-DM,this system implements the partial flow from business data analysis to result model deployment.By making use of PMML standard,this thesis develops an extensible data mining platform which has certain profession standard and multi-strategy.At last,this thesis researches the data mining storage model.According to storage management module of data mining platform,in order to resolve the problem of long average running time of algorithm and accessing to database frequently,and also for solving the problem of storage and sharing of algorithm model,this thesis proposes mid-processing storage based on cache mechanism and PMML-based model storage pattern,and researches a cache strategy adapt to k-means algorithm.This strategy can improve performance of data mining platform and efficiency of algorithm.Data Mining is the main step in KDD process, it draws upon many techniques from diverse fields, such as database technology, artificial intelligence, machine learning, statistics, fussy logic, pattern recognition, and artificial neural network, etc. Mining on Time Series is a hot area of Data Mining due to its widely used applications and its high commercial value.The purchasing of one product when another product is purchased represents an association rule. Association rules are frequently used by retail stores to assist in marketing, advertising, floor placement, and inventory control. Although they have direct applicability to retail businesses, they have been used for other purposes as well, including predicting faults in telecommunication networks. Association rules are used to show the relationships between data items. These uncovered relationships are not inherent in the data, as with functional dependencies, and they do not represent any sort of causality or correlation. Instead, association rules detect common usage of items.Most association rule algorithms are based on smart ways to reduce the number of itemsets to be counted. These potentially large itemsets are called Candidates, and the set of all counted (potentially large) itemsets is the candidate itemset(C).Another problem to be solved by association rule algorithms is what date structured is to be used during the counting process. As we will see, several have been proposed. A hash tree or common.Association rules is one of the research on data mining today, which lays emphasis on the relation of data among the different fields, and finds out the dependent relation among the fields of support threshold and confidence threshold. Dependent relation among the fields of support threshold and confidence threshold. Data mining is to mine this kind of rule in database, that is to say, the occurrence of some case causes the occurrence of other cases.The main contribution of this paper includes:1 Study the current mature products and research harvests internationally;2 The authors do some research work on the typical algorithm Apriori for single_level association rules, the algorithm ML-T2 for multiple_level association rules, the algorithm AR-SET, the incremental updating algorithm FUP and FUP*.3 In order to develop a new algorithm, while discussing the association rules, the authors do some research work on the typical algorithm Apriori for single_level association rules, base of services compression. Reduce database scanning.4 Based of Turbo C design the various functional module of Association rules.
Keywords/Search Tags:Data Mining, Association rules, Multiple_level association rules, Association Rule
PDF Full Text Request
Related items