Font Size: a A A

Research On Pattern Mining Based On Ant Colony Algorithm

Posted on:2021-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:J K NanFull Text:PDF
GTID:2428330611480618Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Pattern mining is the most important field in data mining,and high utility itemsets mining is one of the research hotspots in pattern mining.Algorithms based on evolutionary computation are attracting increasing attention in recent years,as they have the advantage of avoiding the combinatorial explosion of the HUI search space.Scholars have proposed an high utility itemsets mining algorithm based on evolutionary calculations such as Genetic Algorithm(GA),Particle Swarm Optimization(PSO),Bat Algorithm(BA).These usually generate new candidates by iteratively changing the encoding vectors.Different from GA,PSO,and BA,we propose a high utility itemsets mining algorithm(HUIM-ACO)based on ant colony algorithm.These methods usually generate new candidate itmesets by in a constructive way.In the HUIM-ACO algorithm,we use the search path to express the candidate itemsets.Initially,Each search path is initialized proportionally,and incremented by one item per step.At the same time,we propose a pheromone matrix,which is used to store the pheromone values of two different items,allowing the local and global values to be updated,and an ant-path quad is designed for the efficient enumeration of high utility itemsets(HUIs).Compared with GA and PSO-based high utility itemsets mining algorithm,the HUIM-ACO algorithm can mine more high utility itemsets in a shorter time.In order to further improve the performance of the HUIM-ACO algorithm,based on the HUIM-ACO algorithm,a MapReduce-based high utility pattern mining algorithm(HUIM-ACO-MR)is proposed,which is based on the distributed computing MapReduce framework.It can well solve the performance bottleneck of the stand-alone algorithm when the amount of data is too large.In order to ensure the data consistency and reduce the data network transmission overhead,a minimum utility value model for computing blocks is proposed.The potential high utility itemsets in each data block are filtered by the minimum utility value of the block,and the true high utility itemsets are finally calculated from the potential high utility itemsets.HUIM-ACO-MR algorithm affects the performance of the algorithm due to repeated reading and writing of disks during operation.Spark has excellent memory computing capabilities.Therefore,a high utility itemset mining algorithm based on Spark(HUIM-ACO-S)is designed and implemented.HUIM-ACO-S algorithm is the same as HUIM-ACO-MR algorithm,using the block minimum utility value strategy,the HUIM-ACO algorithm is distributed to each RDD for mining,and finally the true high utility itemsets is filtered from the mining result sets.
Keywords/Search Tags:high utility itemsets mining, ant colony algorithm, block minimum utility, pheromone matrix, bitmap
PDF Full Text Request
Related items