Font Size: a A A

Knowledge Inference Of The Characteristic Association Based On Random Distribution Theory

Posted on:2014-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y J YangFull Text:PDF
GTID:2268330401458880Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Association analysis is a data mining technology which discoveries some potential andvaluable relationships between data items from a mass of data. In the traditional associationanalysis method, we seek for rules of data mainly by scanning transaction database for manytimes. But there exists some problems like too many calculations and poor performance whenthe quantity of data arrives some large level. In order to solve the problem of traditionalassociation analysis with low efficiency, combining with theories of probability and the prioriinformation characteristic of transaction data base, this article proposes a new method calledrandom distribution association analysis algorithm to improve the performance of traditionalassociation analysis. Main research contents are as follows:(1) In this paper, we apply central limit theory to create the corresponding normaldistribution based on the priori information of transaction data base, and determine thecorresponding random distribution. Each item set has a probability calculated from randomdistribution and build an item set database which differs from traditional transaction data baserelying on relevant random distribution.(2) The random distribution association analysis algorithm will effectively improvecalculation efficiency of association analysis. New algorithm discoveries rules mainly byanalyzing and processing item set database. Because the quantity in item set database is muchless than this of transaction database, the new algorithm can significantly reduce calculationnumbers and decrease time consuming. In different data amounts,frequent set and associationrules are discoveried. By analyzing new algorithm and traditional algorithm, we obtain therelationship between data amount and the accuracy of new algorithm.(3) In the consideration of updating data in practical application, we propose a randomdistribution association analysis increment method, and then apply the algorithm into twodifferent conditions of transaction data and varied threshold. By comparison with traditionalalgorithm, we prove that new increment method has high accuracy and good efficiency.We use MATLAB to realize random distribution association analysis algorithm andincrement method. According to analysis of algorithm and comparison of sample data, we conclude that random distribution association analysis has the advantage of dealing with theincrement association analysis and sequential association analysis problems.
Keywords/Search Tags:association analysis, random distribution, frequent set, association rules, MATLAB
PDF Full Text Request
Related items