Information has changed people’s way of life,and it has also produced massive amounts of data while facilitating communication between people.For example,finance,shopping malls,etc,generate large-scale data every day,and these data cause randomness in the process of storage and transmission,thereby generating uncertain data.How to dig out these complex and huge uncertain data information and apply them to business has also become the most concerned issue of customers and scholars.Due to the uncertainty of uncertain data,some traditional mining algorithms for mining certain data are no longer suitable for uncertain data.Therefore,it is necessary to design a mining algorithm suitable for uncertain data for uncertain databases.According to different definitions,there are two types of uncertain frequent itemsets: frequent itemsets with expected support and frequent itemsets with probabilistic support.Thesis studies frequent itemsets mining algorithms for uncertain data.Based on the existing probability frequent itemsets mining algorithms and expected frequent itemsets mining algorithms,two effective improvements are proposed to enrich the data processing methods and improve to improve the efficiency of data mining,the main tasks are as follows:(1)Thesis analyzes the existing probabilistic frequent itemset mining algorithms.Aiming at the serious memory consumption problem of the current mining algorithms,the existing algorithms are improved and optimized,and a new mining algorithm HUFP-Growth algorithm is proposed.By introducing a judging mechanism,the algorithm can determine in advance whether two itemsets are necessary to be connected,which saves a lot of space and memory.At the same time,it also reduces the time required for two itemsets to connect and improves the algorithm performance.Mining efficiency.Experiments show that the algorithm has good space and time efficiency.(2)The existing algorithms for mining frequent itemsets of uncertain data based on expectations require multiple scans of the original database,which is time-consuming and the algorithm mining efficiency is low.In response to this problem,thesis proposes a new algorithm TUFIM-Matrix algorithm.The algorithm introduces the pre-pruning strategy and bitmap.The algorithm only needs to scan the database twice to mine out the uncertain frequent itemsets based on expectations.Experiments show that the time and accuracy of the algorithm have been improved. |