Font Size: a A A

Improvement And Application Of Parallel Apriori Algorithm

Posted on:2020-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:W T WangFull Text:PDF
GTID:2428330578955879Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the high degree of integration and development of the world of people,machines and things,the scale of data has exploded and the pattern of data has become more and more complex.The world has fully entered the Internet.The age of data.There are many valuable and meaningful information in the vast amount of data.We need to find information that is valuable to us from these data,so we have developed the technology of data mining.Association rules are an important research direction in data mining.The representative algorithms of association rules are Apriori,PrePost,FP-Growth,etc.Among them,Apriori algorithm is one of the most classic and basic algorithms in association rule algorithm.The Apriori algorithm is a classical Boolean association rule mining frequent set algorithm proposed by Agrawal and R.Srikant in 1994.The algorithm finds frequent sets by layer-by-layer iterative methods.But in today's increasingly large data volumes,the flaws of this classic algorithm are becoming more and more obvious.In this paper,for the defect that Apriori algorithm generates frequent sets and needs to traverse the database multiple times,an improved algorithm WF_Apirori(Weight Function Apriori)based on Boolean matrix and weight is proposed.This algorithm will add weight columns to the matrix to trim out duplicate transactions and compress the stored matrix,saving the time to scan the transaction set,making full use of the intersection between rows and rows,avoiding the k-1 item set The self-joining operation of the k-item set enables one of the required k-term frequent sets to be obtained at one time.Based on the improved Apriori algorithm of MapReduce parallelization,the large matrix is divided into small matrices,and the processing of small matrices is parallelized to reduce the time complexity of the algorithm,which makes the algorithm more efficient and increases the practicability of the algorithm.The experimental results show that the improved algorithm greatly shortens the processing time in the big data environment,improves the mining efficiency of the algorithm,and achieves the expected goal.After demonstrating the effectiveness of the parallelized WF_Apriori algorithm,this paper applies it to the prevention of talc slopes.As an important part of the Belt and Road Initiative,the China-Pakistan Economic Corridor has caused landslides to occur frequently,and the occurrence of landslides has threatened road construction and material transportation.Therefore,this paper selects the Gaizi Valley area of the China-Brazil Economic Corridor as an experimental area,and systematically studies the rules between the hazard factors in the Gaizi Valley.Based on the previous research on disaster points,this paper divides the Gaizi Valley area of the China-Pakistan Economic Corridor into two parts: the research area and the verification area.The remote sensing images in the study area were corrected by ArcGIS,Envi and other software,and the six hazard factors of elevation,slope,aspect,section curvature,soil type and geological lithology of the corrected study area were extracted and analyzed.The algorithm mines the association rules to find out the relationship between the hazard factor and the talc slope.Finally,based on the confusion matrix,the Kappa coefficient is used to verify the excavated rules in the verification area,and the applicability is proved.Regional prevention of talc slopes helps.
Keywords/Search Tags:The Apriori algorithm, Boolean matrix, Parallelization, Geological hazards, Risk factors, Kappa coefficient
PDF Full Text Request
Related items