Font Size: a A A

Research And Application On The Technologies In Mining Association Rules

Posted on:2011-06-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y S HeFull Text:PDF
GTID:1118330338495705Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Association rule is an important method and technology in data mining; in association rule, the operating frequency of I/O will affect the efficiency of mining task. So the main method of reducing this frequency is to reduce the frequency of scanning D Dataset; from another aspect, the number of candidate itemsets that need to calculate support should be reduced in order that the number can approximate to the number of frequent itemsets. The reason is that the smaller number of candidate itemsets can help saving the computing time and storage space. With development of the networks and economy, distributed systems become more and more popular. However, Deficiency of distributed association rule Mining, with some respects to Consultation and competition between each node, Utilization of information, efficiency of network communication, become more obvious and seriously affects the application of association rule mining. Association tree is set up according to the feature of Association rule. At the same time this issue of research strategy and parameter selection for Association tree is discussed in this paper.Main Content and innovation in this paper as follows:(1) transaction-based compression algorithm for matrix multiplication, itemset to user interests, improved key method to the itemset and improved Apriori algorithm are proposed.Deficiency of classical association rule mining algorithm can be resolved by following scheme proposed: reducing amount of Database internal transaction, depending on the cluster of itemset to user interests, improving key method of the itemset, and improved Apriori algorithm based on multiplication of compression transaction matrix. Achievement methods of some improved association Algorithm are given. Then by comparison of those results, improved algorithms can effectively reduce the frequency of scanning database, the frequency of operating I/O. Moreover, the number of candidate itemsets which need to calculate support can approximate to the number of frequent itemsets by those algorithms. As the result, time and space for handling candidate itemsets can be reduced effectively. Correspondingly efficiency of association rule mining is improved. Finally the main problem of association rules is solved.(2) A series of new distributed association rule mining algorithm is proposedFor CD algorithm (Count Distribution), FDM algorithm (Fast Distributed Association Rules Mining) etc, the main problems of those algorithms is that the effect of handling too many partitions is not desired. As a decision problem, Distributed association rule mining algorithm can coordinate transmission of support and the number of support among the different parts of Database. Now Global support threshold function H and the local support threshold function P are set. Optimization of both functions is a very important open issue: when data is asymmetric, Data recovery can be implemented effectively; some communication barriers can also be overcome through this optimization. Distributed Dual Decision mining algorithm can weaken the communication by the way of capturing not all data from large collection set. So the communication complexity of DARM and Communication Complexity of linear n and | C | can be diminished in some degree through proposed algorithms. The new algorithm even asymmetric data or unbalanced partition for data is still effective. For the behavior of these algorithms, we give the experimental results, and describe how to achieve those algorithms in different setting environments (3)Text-based association rule mining decision table induction algorithm which can construct a mixed classification model is proposedFor research of artificial intelligence, classification is an important issue. The purpose of any classification algorithm is that to establish a kind of classification model According to given training data. Then through this classification model, new sample can be classified or available data can be better understood. Because of Accuracy, complexity and compromise on training costs, Association ruls tree Classification turns into attractive powerful tool. For isolated points and the main candidate factors, association rule tree use a common approach to effectively reduce the input data noise while TDIDT algorithms are difficult to deal with isolated points and the main candidate factors. Compared with association rule tree, tradition decision table's Reasoning Machine only find one rule each time. But the association rule Tree can find more rules at the same time. Moreover because association Rule tree doest not exist I / O bottleneck, the induction processing is faster than the decision table and rule reasoning machine. The association rule tree can construct classification model by an effective and measurable way. So The structure of this classifier is smaller than that of the decision tree which is established according to Standard TDIDT method.(4) A method which can automatically select the Optimization set according to heuristic rule is proposed.Obtaining possible rules in the training data set is the first step of Association rule mining tree algorithm. If the pre-specified confidence threshold is not Accuracy, the desired objectives are difficult to be achieved. If the support threshold of actual data set is set too high, classification model is difficult to be generated. Association rules and decision table also will not be produced. In addition, high confidence thresholds are not equivalent with high classification accuracy. Training time of Algorithm may be led to a significant increase because of high confidence thresholds. As such, Selection method of minimum confidence threshold is introduced. Based on this classification model, its parameters can be automatically adapted; speed and accuracy of association rule tree classifier can also be improved.(5) In an unstable body of typical landslide area, monitor points are installed. Through the regularly monitoring, landslide deformation data are captured. Based on certain purpose, the raw data is clean and converted. After that some methods of association rule are applied to handle those data .at last some useful information gained can achieve purpose of monitoring landslide. From another view, the final results are a concise expression and are consistent with people's decision-making. It also is reliable, in conformity with the natural reality of landslide area.
Keywords/Search Tags:data mining, association rules, compressed matrix, communication efficiency, association rules tree, landslide monitoring
PDF Full Text Request
Related items