Font Size: a A A

Research And Application For Association Rules Mining Based On Distributed Computing

Posted on:2021-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:L GuoFull Text:PDF
GTID:2428330614459261Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data mining is a process to mine information which is beneficial to decision making,from mass data by algorithms.As an effective method for data mining,association rule mining can easily extract useful knowledge from data.However,traditional association rule mining requires a plenty of time when processing no less than millons data.Moreover,redundant association rules are easily mined by these approaches.In addition,traditional approaches have such problems as repeated mining for historical data and low accuracy of rule mining when dealing with incremental data.Aiming at the above problems,this thsis has studied and explored the optimization for big data mining speed and the deredundancy of association rules based on distributed computing.An incremental association rule mining model is also researched and explored.At last,the model are implemented in online retail to provide effective information for decision makers.The main work of this thsis is as follows:1.Construct a comparatively excellent association rule mining model based on distributed computing.To improve the existing methods which exists problems including a slow mining speed for big data and its redundant rules in mined association rules,the data is divided into multiple parts based on distributed computing.Next,a frequent itemset mining approach for itemsets with different length is adopted to mine local frequent itemsets from each part.After that,combined with depth-first search,more valuable itemsets are picked up from local frequent itemsets starting with maximal frequent itemsets.The redundant itemsets can be discarded according to relative quality in this step.At,last,only remaining itemsets are mined and association rules are generated based on them,insead of the confidence.This method avoids the waste of computing resources caused by the mining of redundant frequent itemsets,and it does not use the parameter confidence to mine association rules.Therefore,time waste caused by improper parameter adjustments can be reduced.Finally,algorithm is applied in real and artificial datasets.Its effectiveness is verified both in reducing mining time for frequent itemsets and discarding redundant association rules.2.An association rule mining model in an incremental data environment is constructed.Aiming at the problem that the existing methods cannot efficiently mine incremental data and the mined association rules are not accurate,this thsis proposes an optimized incremental association rule mining model based on distributed computing.The model aims at ensuring mining accuracy and improving mining speed.It avoids repeated mining of historical data when coming new data.Only intermediate data restored in historical mining results are used in this model.Furthermore,a bitmap retrieval method is adopted to further improve mining speed.Finally,the algorithm is compared with other algorithms inmultiple datasets.The results proved that its running speed is improved the accuracy of association rules is also guaranteed.3.A solution for online retaile based on association rule mining.Aiming at solving the problem of mining speed and redundant information in online retail,the MR-IARM algorithm which based on distributed computing is apllied on online retail in this thsis.The product data is analyzed and decided effectively to provide correct information for decision maker.
Keywords/Search Tags:data mining, association rule mining, distributed computing, incremental mining
PDF Full Text Request
Related items