Font Size: a A A

Mining Frequent Weighted Itemsets Based On WDiffNodeset And WNegNodeset Structure

Posted on:2020-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:X X FangFull Text:PDF
GTID:2428330602986838Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,mining weighted association rule has been one of the hot topics in the field of data mining.It solves the problem that traditional mining association rule only considers the frequency of item occurrence but not its importance.In practical application,items are often of different importance depending on their values or meaning.Mining weighted association rule is meaningful,because it considers not only the frequency of items but also their importance.Therefore,mining weighted association rule has been proposed.Traditional mining weighted association rule does not satisfy the downward-closure property.Later,weighted association rule mining algorithm considering the transaction weight was proposed.The weighted support value of itemset reflects the different importance of transaction and this algorithm naturally maintains the downward-closure property.So it has become the mainstream direction in this field,but there are still some shortcomings in mining efficiency:(1)In terms of scanning the database:some algorithms require multiple scans of the database;(2)In terms of connection and pruning strategies: Some algorithms generate a large number of candidate itemsets per join,which affects the efficiency of mining.On this basis,this paper mainly aims to improve the shortcomings of mining weighted frequent itemset algorithms.This paper mainly does the following work to solve the above problems:(1)This paper describes the research status of data mining,association rule mining and weighted association rule mining,analyzes the related algorithms in recent years,and summarizes their advantages and disadvantages.(2)This paper develops a frequent weighted itemsets mining algorithm based on WDiff Nodeset structure(Diff NFWI).In the mining algorithm for frequent weighted itemsets based on WN-list(NFWI),a large number of intersection operations are performed in dense datasets,so mining weighted frequent itemsets(FWI)is inefficient.To solve the problem,Diff NFWI is proposed.This algorithm introduces the idea of diffset.Firstly,it uses a novel data structure of WDiff Nodeset,The data structure found weighted frequent itemsets using a set-enumeration tree with a hybrid search strategy,reducing a large quantity of intersection operations and achieving high efficiency;secondly,the algorithm uses the diffsets strategy to calculate the weighted support degree of itemsets quickly,making WDiff Nodeset more suitable for mining weighted frequent itemsets;finally,the simulation results show that the Diff NFWI algorithm has higher mining efficiency than NFWI algorithm.(3)This paper proposes a frequent weighted itemsets mining algorithm based on WNeg Nodeset structure(Neg NFWI).Despite the advantages of WDiff Nodeset,we find that calculating the difference between two WDiff Nodesets takes a long time on some databases,so Neg NFWI algorithm is proposed.Firstly,this algorithm uses another effective data structure WNeg Nodeset.Similarly,WN-list,WDiff Nodeset and WNeg Nodeset are all based on prefix trees data structure.The difference is that the data structure employs a novel encoding model for nodes in Bit Map Weighted-tree(BMW-tree)based on the bitmap representation of sets,and uses bitwise operators to extract WNeg Nodesets of itemsets quickly,avoiding a large quantity of intersection operations;secondly,the time complexity of the algorithm is reduced to(2)nO x ?,where x is the length of WNeg Nodeset and n is the number of weighted frequent 1-itemset;thirdly,this algorithm uses diffsets strategy to calculate the weighted support degree of itemsets quickly;finally,results from simulation experiments show that the proposed algorithm is efficient and feasible.The experimental results show that the Neg NFWI algorithm outperforms NFWI and Diff NFWI algorithm in terms of time efficiency,and Diff NFWI algorithm still has good performance,compared with NFWI algorithm.In short,this paper combines the theory of mining frequent itemsets and mining weighted frequent itemsets,adopts two data structures,and proposes two improved algorithms respectively.Experimental results show Diff NFWI and Neg NFWI algorithms have better performance in different datasets than the original algorithms.
Keywords/Search Tags:weighted frequent itemsets mining, frequent itemsets mining, the weighted support, diffsets strategy, hybrid search strategy, bitmap weighted-tree
PDF Full Text Request
Related items