Font Size: a A A

Research On Association Mining Optimization Based On Spark Distributed And Application Of Comprehensive Decision

Posted on:2020-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z B HuangFull Text:PDF
GTID:2428330590463873Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the rapid development of social information,the development trend of big data also tends to increase in stability.Among them,the large scale of data and the variety of types will inevitably imply potential knowledge information.Today,there is still a problem of “rich data and lack of knowledge” in the field of big data.Although some scholars in recent years have proposed relevant research on knowledge base discovery,they still cannot meet the increasing complexity requirements.Therefore,how to transform huge data resources into valuable information,how to effectively improve the efficiency of knowledge discovery,and how to expand the research field of big data analysis technology has become a hot issue that needs to be solved today.Aiming at these problems,a Spark-based computational engine is proposed to optimize the association rules mining algorithm and integrate the improved strategy into the distributed computing architecture.At the same time,it can solve the practical problems.The main research content of the thesis has the following four parts:Firstly,the related theories of association rules are deeply studied,and the optimization ideas are determined according to the deficiencies of traditional algorithms.The unique Prime theory is introduced,and the transaction set is digitized by Prime mapping to improve the compression ratio.Abandoning the traditional HeadTable mode,it avoids the time consumption of multiple sorting and frequent pattern base recursive construction.At the same time,a new rule tree is constructed: PNFP-Tree,and the frequent items are deeply excavated in the GCD(the greatest common divisor)mode.Aiming at the scale problem of Tree,the method of vector pruning and matrix compression is proposed to optimize the overall mining efficiency.Secondly,a distributed weighted equalization grouping optimization strategy is proposed.This is different from the parallel computing model.It uses the node weight estimation probability,the candidate sequence set length and the residual compressed tree size calculation model to divide and divide the task.The subsequent subtrees are independent and perform GCD pattern mining of subtasks,which effectively solves the problem of unbalanced node calculation,thereby reducing the Shuffle overhead between nodes and not affecting the final mining result set.Compared with Hadoop's MapReduce,Spark is more suitable for iterative computing,and then proposes a PNFPM algorithm based on Spark distributed.Considering the followup of massive data sets under actual conditions,a DDS(Dynamic Streaming Data)mode is used to perform periodic block mining techniques.The experimental results show that the PNFPM algorithm is better than the traditional algorithm.Finally,in order to expand the research field of big data analysis,and verify the practicability and compatibility of PNFPM algorithm,design and implement the Sparkbased PNFPM algorithm,and apply the gridded event analysis decision function module under the comprehensive management work.The PNFPM algorithm is integrated with the multi-criteria decision-making method to obtain early warning results for high decision making.The experimental results show that the Spark-based PNMFP algorithm is feasible,efficient and scalable.It also shows that it can be applied not only in business mining,but also in the government field.
Keywords/Search Tags:Big Data, Apache Spark, Association Rule Mining, Optimization Strategy, Gridding Events
PDF Full Text Request
Related items