Font Size: a A A

The Research On Association Rules Mining And Algorithm

Posted on:2008-06-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:N LuFull Text:PDF
GTID:1118360212997680Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
This article is mainly focused on the rule mining of the fuzzy quantitative and restraint rule mining, includes the item restraint rule mining, the association rules solution space optimization question, the association rules parallelization method and the quantitative association rules mining. Statistical methods have been conducted research and the discussion thoroughly. A series of definitions, the theorems and the new algorithms are proposed to solve certain theories and real problems.The association rules mining is one of the most active research techniques in data mining. The first time of proposing the association rules goal is in order to discover in transaction database between different commodity relation rules. These rules have portrayed in the commodity sale process customer's purchase behavior pattern, may use to instruct the business to arrange the inventory, the stock as well as the shelf design scientifically and so on. Generally speaking, the association rules mining may discover exist in the database items or during the attribute interesting relations, these relations are in advance unknown or are hidden. In other words, they cannot obtain through the database logical operation or the statistical methods. This indicated that they are not based on the data itself inherent attribute, and appear the characteristic based on data item at the same time. To discover the association rules may assist the people to carry on the market operation, the policy-making support and the commercial management, the website design and so on.Since the association rules mining question has proposed, people proposed many association rules mining algorithms. How these algorithms basically all are revolve the fast highly effective production frequent collection. This core question need carry on launching. The Apriori algorithm was still many new algorithms'prototype, and many new algorithms are based on the varieties of Apriori algorithm. In the database, the data attributes are different, so the mining methods are different too. The exploration of the regular types will expand the classified association rules, the quantitative association rules, the multiple concept level association rules and other different types, and proposed the corresponding mining algorithm will be an important research content.How to revolve the enhancement of the association rules and the mining efficiency to launch is the core object of this article's study. The research key point is connected in the rule mining process the algorithm design question. Because the data mining duty aims at the different user demand and the different application domain design different algorithms, therefore does not have an algorithm to suit in all mining duty, when excavates the different types and the pattern algorithm design also has the very big difference. This article attempts to discuss from many sides to consider rule mining algorithm design questions The goal is to design highly effective, novel algorithm satisfies the different application requirements.In the collection and the reorganization massive correlation data, clarify foundations and so on in correlation theories, method, algorithm, architecture, the centralism has been selected following several question development correlation works:1) connected the rule mining theory and the system skeleton research. In the analysis existing association rules mining system architecture foundation, the systematization has studied the mining system unit process and the function part. These researches include discussing the main function part from the data mining unit process which the association rules mining system should have and relates mutually.2) the different source data types is connected the rule mining system the function part request; The different application goal is connected the rule mining system the function part request; Association rules mining system main function part realization mechanism and so on.3) connected the rule mining algorithm designing. The influence association rules mining efficiency factor mainly has the I/O price, the memory demand, the CPU time expenditure which the database scans and so on. The present association rules mining algorithms still need to innovate or improve in these aspects. Therefore, we selected the association rules mining algorithm to carry on the centralism to study, to realize, and to confirm our new mining theory and the accumulation development data mining system experience through the series algorithm design and the realization.It is well known, the data mining has already established the mining theory system which many are characteristics, but no matter from aspects and so on application scopes. Validity as well as compatibility still needs the exploration of new mining theory. Therefore, this article is connected the rule mining theory and the model which conducted the research has established based on the restraint frequent collection operation association rules mining model and so on.Because the association rules mining related algorithm and the concept need to unify the different application domain to make the corresponding modifications, therefore exploring in different application domain models and the algorithms is extremely popular topic in present. This article revolves the solution association rules mining efficiency and the elastic question, reduces the computation the complexity, enhances the algorithm's running rate to launch the work and so on in the former research and under each kind of fund support, the author does the correlation research work. But connected the rule mining technology is late in this domestic research, the development of the research is carries on in the overseas scholar and in expert's research work foundation. Mainly includes the following content:In chapter 2, summarizes and induces the restraint association rules mining classification, includes the restraint association rules mining definitions, the theorems and the algorithms. The article key discussion introduces the item restraint the Eclat algorithm which solving an association rules mining question. Eclat uses the vertical structure of the database, divides using the concept standard theory to alternative collection space from certain small spaces, dependents on the relations between neighboring grid carries on an item of collection cutting out. Because the algorithm may alone process in the memory each sub-grid, it reduced the I/O expenses. Because simultaneously asks frequent collection for most greatly frequent collection set, it enabled the algorithm the efficiency to obtain the enhancement.In chapter 3, using fuzzy set theory to solve existing association rules mining method the consideration and the item correlation quantitative information flaw, includes the fuzzy quantitative restraint association rules definition, the algorithm. Will blur inquires and the regular template concept organic union, which proposed the mining includes the fuzzy quantitative restraint association rules definition, has given the formula and the integrity mining methods. Therefore includes the fuzzy quantitative restraint association rules mining to be possible to regard as is connected the rule mining expansion, the fuzzy quantitative association rules biggest merit is semantic and human's expression way which expresses is extremely close, easy to understand. Designed the corresponding experiment and gave the empirical datum integrity to have the plan.In chapter 4, discussed the association rules solution space optimized issue, gave the unexpected association rules definition, the algorithm. Proposed two kind of unexpected association rules definitions, one kind is the unexpected template rule, the other is the regular template consequent different unexpected rule. The later rule in fact finally gives the user the main result, namely these are unable beforehand the rule which foresees. Proposed a method to remove these using the chi-square test to lack the correlation the item of collection method, and carried on sorting using the information gained in the second kind of rule, pointed out an information bigger rule is the interest big rule.In chapter 5, discussed the association rules mining parallelization issue, carried on the review to the parallel algorithm, introduced several common parallel algorithms, and carry on the performance analysis and the appraisal to it. Based on this foundation, this article proposed new parallel algorithm NA. NA computation partial major term collection obtained the optimization, moreover only needs synchronize 2 times, surpass other parallel algorithms in the computational method and the synchronized number of times.In chapter 6, gave the multiple concept level quantitative association rules mining definitions and the algorithm frames, the multiple concept level quantitative association rules mining in fact used in statistics supposition inspection procedure to determine the rule's interesting degree, because this kind of rule in submission for user's time has followed one comparison item, therefore was advantageous to user's understanding. This may avoid the smallest confidence level threshold to its mining assigning. This article designed revised the interpolation to take the rule interest judgment standard the quantitative association rules definition and the algorithm. This algorithm is connected the rule and discovered the inverse correlation the rule, simultaneously avoided artificially assigning the smallest confidence level threshold the trouble. Specially this algorithm mining rule often find some important rules which neglected by other algorithms.In the final chapter, summary the whole article and forecast open questions in the association rules mining domain.
Keywords/Search Tags:Data mining, association rules, item restraint, fuzzy restraint, parallel algorithm, multiple concept level
PDF Full Text Request
Related items