Font Size: a A A

Research On Novel Methods In Utility Pattern Mining

Posted on:2020-07-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Naji Qaid Abdullah Al-husainiFull Text:PDF
GTID:1368330578481650Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology such as the Internet,the Internet of things,and cloud computing,information technology has been merging with traditional applications in the fields of politics,economy,military,scientific research,and others,which has produced massive amounts of data beyond any previous era.At the same time,smart mobile devices,sensors,e-commerce sites,and social networks around the world create data of all types at all times.Simply say,We are drowning in data,but starving for knowledge.Faced with a large amount of data,how to analyze data in a timely and effective manner and to extract possible patterns closely related to people's living habits is a matter that the government and institutions in the information age need to pay attention to quickly.For example,the CSRC(The China Securities Reg-ulatory Commission)judges whether there is insider trading and banker control through the analysis of the price and quantity of the buyer and seller of a stock;Alipay network technology company gets the consumption habits of different groups of users by analyz-ing the consumption records of Alipay users on the network platform,and formulating corresponding marketing strategies;The ministry of transport analyzes traffic flow in-formation for road networks at different time intervals and develops a policy to reduce traffic congestion in cities.Data mining refers to the process of finding important,un-known and potentially useful patterns in the database.Association rule mining(ARM)is one of the central tasks of data mining.However,traditional methods that rely on the value of support in extracting patterns have failed to meet the needs of users who wish to extract patterns based on the value of utility.Thus,utility pattern mining,which is the subject of our study,has emerged to meet this need.Numerous algorithms have been proposed in this field.However,the current algorithms still sufferer from some problem in performance such as runtime,number of extraction patterns and handle with updating the mining results.In addition,there is still a requirement to find new types of pattern that meet the needs of users in some applications.Thus,this thesis proposed four problems in the field of utility itemset mining.The innovative research results are as follows:1.Scanning database repeatedly candidate generation is one of the major challenges of previous HUIM algorithms.In this regard,UP-Growth algorithm is a stand-out amongst the best algorithms for overcome this challenge.However,it needs scan database twice to actualize the UP tree.Regarding of the the mining result by adding new data to the existing once,UP-growth needs for twofold scanning of new and existing database.Therefore,this study proposed an algorithm that needs to scan the database only once named as Single-Scan Utility Pattern Growth Algorithm(SSUP-Growth).If the previous mining result needs to be updated with new data,the proposed algorithm only needs to scan the new data once.The propose algorithm relies on compact tree structure and take the advantage of the items distribution in the database,as many transactions contain the same items thus they represent one branch in the tree and therefore the treatment of one branch in the tree equals the processing of many items in the database.The propose algorithm inserts all items in the database into the primary SSUP-Tree and then obtains the corresponding UP-tree from primary SSUP-tree.Therefore,when it is necessary to update existing mining results,only need to scan new data once and add new transactions to the primary SSUP tree.Then,get the updated UP tree.Through empirical evaluation,SSUP-Grwoth's runtime is significantly improved compared with similar algorithms.2.Recently,numerous high utility itemset mining methods with multiple minimum utility threshold have been proposed to consider the differences in the importance of the items and their characteristics.In such methods,each items is given a min-imum utility value individually.However,they still suffer from "rule missing"and "rule explosion" problems,since the minimum item item utility of each item is assigned equal to the percentage of its external utility.In this regard,we pro-posed a new notion,named Utility Differential(UD),for efiectively specify the minimum item utility in order to enhance HUIM using multiple minimum utility thresholds.The proposed notion ensures a fixed difference between the actually utility and the corresponding minimum item utility value for each item.We,also,proposed a two-phase algorithm HUI-MMU-UD based on the proposed Utility Differential(UD)notion.We execute an accurate experimental assessment of the proposed notion,Utility Differential,incorporating it into the state-of-the-art algorithm,HUI-MMU,on two dense and sparse datasets to illustrate its effec-tiveness.Compared with the current methods,the experimental results determine that;our newly proposed notion extracts HUIs in a compelling way.3.Most of the prevailing empirical articles have focused on HUIs.Nevertheless,in many practical situations,low utility itemsets(LUIs)maintain a high level of significance and usage(e.g.,in security systems,the low utility itemsets represent the security system vulnerabilities that need monitoring).Hence,We proposes a new association rule mining(ARM)framework named low utility itemset min-ing(LUIM),which extracts LUIs.Enhancing the performance of LUIM,we in-vestigated the g TWDC(Transaction Weighted Downward Closure)property to improve utility mining generator inclusion.The proposed notion determines the generators based on the Transaction Weighted Utility(TWU)of itemsets,which remains more efficient for utility mining.Moreover,we offer two efficient algo-rithms:LUG-Miner(low utility generator miner)and LUIMA(low utility itemset mining algorithm).LUG-Miner extracts LUGs,succeeding in obtaining useful properties from the TWU model and incorporating a level-wise algorithm.On the other hand,LUIMA uses LUGs to detect all low utility itemsets.The exper-imental results on both dense and sparse datasets illuminated the recommended framework,and the algorithms are efficiently operational.4.A number of studies and methods have been proposed to obtain more efficient patterns that meet the requirements of decision makers by considering frequency and utility thresholds,for example,frequent high utility patterns and high utility rare patterns.However,decision makers need to know the patterms that lead to minimal benefits and appear frequently to understand the reasons for low utility rates,as their frequency is a factor in increasing utility value.For this reason,We propose a new framework for extracting a new model that is important in many cases,named as,frequent low utility itemset(FLUI).In order to extract the proposed patterns,we propose a new algorithm called FLUI-Growth,which extension of the known UP-Growth algorithm.We have also developed an al-gorithm to extract FLUIs called frequent low utility itemset mining algorithm(FLUI-Growth).FLUI-Growth algorithm is developed based on state-of-the-art algorithm UP-Growth that uses compact tree structure to extract the desired item-sets.The new frameworks and algorithm have been evaluated on many bench-mark databases.The results show the effectiveness and practicability of the new framework and the new algorithm and its applicability in real life.
Keywords/Search Tags:Utility Itemset Mining, High Utility Itemset, Low utility Itemset Mining, Multiple Minimum Utility Threshold, Frequent Low Utility Itemset Mining, Utility Differential(UD)
PDF Full Text Request
Related items