Font Size: a A A

Research On Association Rule Mining Algorithm Under Big Data Background

Posted on:2019-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiangFull Text:PDF
GTID:2428330578956632Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous development of computer processing and data analysis technology,various industries have undergone revolutionary changes.Society has entered a new digital age,and the concept of "big data" has emerged as the times require.In many industries,such as society,economy,culture,science and technology,the digital information is more recorded and used for analysis,and the amount of data is increasing explosively.The huge value of scientific research,commerce and so on is hidden behind the rapid growth of data information in big data era,so it is more important to seize the value of data itself.The essence of data mining is to extract useful value from the relationship between itself and obtain the desired results through different technical methods.The data produced in various industries under big data era is not measurable,but also the value is infinite.This kind of realistic need makes us more need to carry out data mining,so as to obtain the benefits in essence.Mining association rule is an important technical means in data mining,which aims at mining the relationship between the items in the transaction,and has been widely used in various fields.Especially in the business to provide the operator with the customer's interest in purchasing,so as to provide the right and effective products for profit.Therefore,this article takes big data as the background,according to the current data itself characteristic and the existence certain question,carries on the theoretical research from the following several aspects:(1)In view of the diversity of data,because of the single minimum support in the process of data mining,the efficiency of mining is not high,and the problem of redundant rules is brought about.In this paper,an algorithm of big data association rules based on multi-minimum support is proposed.In the process of mining frequent itemsets,by setting a separate support threshold for each item,the minimum frequent item support threshold is used as the screening criterion to delete redundant nodes.We can automatically stop mining down and delete redundant candidate itemsets so that we can get all frequent itemsets quickly and directly,because there is no need to scan database frequently in the whole mining process.The experimental results show that the mining efficiency can be improved and the computation time can be saved by setting a separate support threshold for each item.(2)In view of the magnanimity of data,this paper proposes a parallel association rule algorithm based on Spark,which mainly proposes three improved strategies for mining association rules.Firstly,single and double paths are split by improved FP-tree,.Distribution and mining at the same time,using Cartesian product operations to obtain frequent itemsets,in order to achieve the purpose of reducing the number of iterations,and then,in the process of grouping,the idea of balanced grouping based on greedy strategy is used.The items in the frequent item set are grouped evenly,which effectively solves the problem of load imbalance in the grouping process.Finally,through the idea of parallel mining of the data set,the data set is divided horizontally.Construct conditional pattern tree to mine frequent itemsets.The experimental results show that the proposed improved algorithm has higher mining efficiency and better scalability,which is suitable for data mining and analysis under the background of big data.(3)In order to improve the efficiency and quality of association rules mining algorithm,This paper presents an association rule analysis method based on community structure.This method abandons the idea of traditional association data mining algorithms and combines association rules with complex networks.The data mining problem of association rules is transformed into a community discovery problem of complex networks by constructing a complex network through the topology structure between association rules.Firstly,the association rule structure is transformed into a complex network,and a new association complex network is constructed.The Hausdorff distance based on probability density function is introduced into the combinatorial optimization algorithm to propose an improved community discovery algorithm.Finally,the effectiveness of the method is verified by experimental analysis.
Keywords/Search Tags:Big data, Multiple minimum support, Parallel algorithm, Association Network
PDF Full Text Request
Related items