| Frequent item set mining is a fundamental problem in the field of data mining.Combinations of frequent items should be found from massive data by the related algorithms in the mining process.It is necessary to find all frequent item combinations in the original data set as quickly as possible with the accuracy guaranteed.In the field of frequent itemset mining,there are two classic mining algorithms: Apriori algorithm and FP-growth algorithm.In Apriori algorithm,the transaction data set should be scanned every time when the support degree is calculated.It leads to a huge time cost.The unique tree structure of the FP-growth algorithm is not easy to be split,which makes it difficult to process the data in batches.At the same time,there are recursive operations in the mining process,which is not conducive to the parallelization of the algorithm.A linear table frequent item set mining algorithm based on bit combination is proposed in this thesis.It is called BCLT(Bit Combination Linear Table)for short.In this algorithm,the original data set is processed firstly by a series of pre-operations such as counting,sorting and clipping,and then a linear table is constructed according to the processed data set.Finally,frequent item sets are mined according to the linear table.In the process of mining,element comparison needs to be done from bottom to top one by one.For elements with low frequency,the degree of sharing is low.The result is that the algorithm is not very effective in mining speed.In view of the shortcomings of the BCLT algorithm,BCLT-O(Optimization Algorithm for Bit Combination Linear Table)is proposed in this thesis.The BCLT-O algorithm combines two kinds of frequent item set mining ideas.One idea comes from Bitwise and operation based on bit combination and another idea comes from linear table based on bit combination.In the BCLT-O algorithm,the binary data is integrated into the original BCLT linear structure,and the horizontal data storage is converted into the vertical data storage.The data storage method and data structure are improved,which can effectively improve the mining efficiency.In the final experiment,compared to the previously unoptimized algorithm and the bitwise and operation algorithm based solely on bit combination.,the BCLT-O algorithm has greatly improved the mining speed.In this paper,the algorithm is finally applied to mining frequent item sets of soybean promoter data.The purpose of the research is to filter out frequently occurring regulatory elements and their combination items in the promoter data of soybean genes.In the process of mining frequent item sets,frequent 1-itemset culling,pruning,and space optimization operations are adopted.Finally,satisfactory results are obtained in time and space.Finally,experimental comparison shows that the BCLT-O algorithm has made great progress in mining speed compared to the BCLT algorithm.At the same time,BCLT-O algorithm also greatly exceeds the bit combination and bit combination optimization algorithms.Compared to the FP-growth algorithm,although the BCLT-O algorithm still has a certain gap in mining speed,BCLT-O algorithm has its unique advantages.Firstly,because the BCLT-O algorithm eliminates recursive operation,it is beneficial to the parallelization of the algorithm.Secondly,the unique linear table structure in BCLT-O algorithm has a high degree of splitting freedom.When the original data set is too large,it is easy to mine in batches. |