Font Size: a A A

Research And Application Of Incremental Mining Algorithm For Item Set Based On Three Decisions

Posted on:2016-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ZhangFull Text:PDF
GTID:2208330470952886Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of E-commerce, studies on real-time mining of frequent itemsets and high utility itemsets from dynamic big data are two important and meaningful studies. Frequent itemsets mining can provide merchandise groups to sell better/more frequently for merchants, while high utility itemsets can provide merchandise groups with more profits for them. Scholars have proposed many classic data mining algorithms on both topics in recent years. However, when it comes to incremental mining tasks in dynamic big data, the performance of existing algorithms is not satisfactory.In this paper, a new and unified technology framework which can mine both frequent itemsets and high utility itemsets (hereinafter referred to as itemsets collectively) incrementally is proposed. The technology framework consists of three algorithms:online updating, offline mining and synchronization mechanism algorithms. Online updating algorithm applies three-way decision theory in the incremental updating of itemsets, which can quickly provide merchants with the latest itemsets results in real time. The algorithm divides the whole set of itemsets into three regions:positive region, negative region and boundary region. When there is an incremental updating, itemsets in the positive region are directly accepted and output; itemsets in the negative region are directly rejected and abandoned; the algorithm only needs to make further updates for the itemsets in the boundary region based on the incremental data. However, when the incremental data is sufficient, due to the randomness of the data, part of itemsets which are abandoned by the algorithm will become frequent itemsets. Therefore, the obtained itemsets results have some errors.Offline mining algorithm mines itemsets based on the whole data set at the current moment and divides them into positive region and boundary region, the itemsets which are located in negative region will not be counted and stored. This kind of mining algorithms are flexible and optional, which means that users can choose existing outstanding itemsets mining algorithm according to their own needs. The offline mining algorithm can only provide delayed itemsets results because the data is massive, but these itemsets results are accurate.Synchronization algorithm combines online updating with offline mining to obtain sufficiently accurate and timely itemsets results. The synchronization algorithm can regulate the specialization and cooperation between online updating and offline mining using various parameters. If the amount of new-added data is less than n (a specific range), the online updating will be enabled and offline mining will be standby. If the amount of incremental data exceeds n, then offline mining will be enabled, while online updating continues to be executed, and the itemsets results of online updating will then be replaced by the results got by the offline mining. In addition, the synchronization mechanism algorithm controls the incremental updating error by adjusting the size of the boundary region:the larger the boundary region, the smaller the error.In order to keep the error of online updating in a user-acceptable range, a parameter learning method has been designed and implemented for the synchronization mechanism algorithm in this paper. For frequent itemsets, a probability model is established and the query function is derived. Error estimation is carried out theoretically and validation of the algorithm is conducted on three public datasets and the results show that its error will not exceed0.01%. The performance of this algorithm is three orders of magnitudes faster than batch mining algorithm and two orders of magnitudes faster than incremental mining algorithm. For high utility itemsets, appropriate parameter settings are directly obtained utilizing experiential learning approach and then the lookup table is derived. The experiments of the algorithm are carried out on synthetic datasets. Similarly, the results show that it is2or3order of magnitudes faster than existing algorithms and its error will not exceed0.01%.Based on this framework, a fast prototype of a Decision Support System for E-commerce is designed and implemented in this paper. The prototype system is not only able to provide decision support information of both high utility itemsets and frequent itemsets for e-commerce merchants, but also adds merchandise discount and on-shelf/off-shelf strategies to build a more practical and comprehensive decision-making knowledge system.
Keywords/Search Tags:E-commerce, frequent itemset, High utility itemsets, incrementupdating, parameter learning, three-way decision
PDF Full Text Request
Related items