Font Size: a A A

Research On Average High Utility Itemsets Mining Algorithm

Posted on:2014-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:M JiangFull Text:PDF
GTID:2248330398950004Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Association rules mining which provides users with rules according to internal correlations of data, is the most active field of data mining. Users can use these rules to make decisions, prediction and do other operations in various fields, such as commercial activity, science research, bioinformatics and many other fields.The traditional association rules mining aims at discovering the frequent itemsets, only considering the occurrence frequencies of items. Then rules can be generated with the frequent itemsets. Howerver, the importance of distinct item is not taken into account. Thus, some infrequent but useful itemsets may be not discovered. To resolve this problem, utility based association rules mining was proposed. Utility was defined to measure the importance of distinct items which can present the preference and interest of users.Traditionally, the utility of an itemset will increase along with the increment of the length of the itemset. To eliminate the effect of the length, average utility defined as the total utility of an itemset divided by length of this itemset was proposed. The present methods for mining average high utility itemsets need scan the dataset many times and generate large number of candidate itemsets which cost much time and space. This paper aims to improve the efficiency of mining average high utility itemsets. The main works include:The merits and drawbacks of the current average high utility itemsets mining algorithms are analyzed and a new algorithm is proposed, called HAUI-Mine. HAUI-Mine scans dataset twice to construct the HAUI-Tree and generates no candidate itemsets. During the mining process, the condition pattern trees are built recursively to generate average high utility itemsets. The results of experiment show that with the condition of dense dataset or lower threshold, the HAUI-Mine outperforms HAUP-Mine evidently.An algorithm called ITR-Mine for mining average high utility itemsets from data stream are proposed. Algorithms of mining itemsets on traditional transactional datasets cannot be applied on data stream directly because of the character of data stream. ITR-Mine combining the sliding window forms a method which scans the data only once and generates no candidate itemsets. The construction of ITR-Tree can be easily and efficiently modified when window slides without reconstructing the ITR-Tree completely.
Keywords/Search Tags:Data Mining, Association Rules, Average High Utility Itemsets
PDF Full Text Request
Related items