Font Size: a A A

Research On Apriori Algorithm Optimization Based On Binary Code And Incremental Update

Posted on:2022-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z M LuoFull Text:PDF
GTID:2518306731953439Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the increasing demand for data processing in various fields of society,traditional association rule mining algorithms are increasingly used.Apriori is widely used as a representative algorithm,but the algorithm has problems such as long running time,repeated scanning of the database during the iteration process,and repeated calculation of a large amount of the same data after adding data.Based on the above shortcomings,this paper makes improvements and optimizations to the algorithm.The main tasks are as follows:(1)Propose an incremental update CBEF-Apriori algorithm based on binary coding.The algorithm first converts the incremented frequent1-itemsets and transaction databases into binary codes,thereby transforming the calculated itemset support into a binary code bit operation process of the itemset and transaction databases;then converts the frequently generated itemset in the old database into binary codes.The item set information is reasonably used in each iteration of the new database,and new frequent itemset are mined.The experimental results show that compared to the classic Apriori algorithm and the CBE-Apriori algorithm,the improved algorithm mines the correct frequent itemsets without reducing the number of items,and the performance is compared with the classic Apriori under different increments and support of smaller data scales.The algorithm has an average improvement of 290%,which is an average improvement of 125% compared to the CBE-Apriori algorithm.Compared with the classic Apriori algorithm,an average increase of201.48% compared to the classic Apriori algorithm under different increments and support of larger data scales,an average increase of235.25% compared to the CBE-Apriori algorithm.Experiments show that the algorithm reduces the size of the candidate item set,and it is more in line with practical needs while improving the efficiency of the algorithm.(2)Optimize the CBEF-Apriori algorithm based on multiple processes.The CBEF-Apriori algorithm under a large-scale data set in a single-threaded environment has been improved in efficiency,but it still takes a long time.Through analysis,it is found that in each iteration of the algorithm,it is necessary to frequently use the candidate item set generation function and determine the frequent item set from the candidate item set.Therefore,the above two functions are optimized for multiple processes.Through experiments,it is found that due to the consumption of multi-process resource scheduling,the algorithm efficiency is not significantly improved or even decreased under the short time-consuming10% increments,but the time-consuming 40%,60%,and 80% are longer.The average efficiency improvement of the algorithm under the increment is 183.4%,347.28%,and 529.59% respectively.It proves that the multi-process improved algorithm has a better performance than the serial algorithm in a larger data set and a larger increment.(3)Practical analysis of the improved algorithm and its application in real diabetes data sets.According to past research,the data attributes are selected as the standard,and the attributes of the data set are discretized and mapped to the corresponding item label table.Divide the data set into different database incremental scales,and conduct comparative experiments under different support parameters.Experimental results show that the improved CBEF-Apriori algorithm with multiple processes under real data mining the correct rules,the algorithm's operating efficiency is increased by an average of 142.43% compared with a single thread,which proves the practicability of the algorithm on real data sets.
Keywords/Search Tags:Data Mining, Association Rules, Incremental Update, Binary code, Multi-process, Medical Data
PDF Full Text Request
Related items