Font Size: a A A

Research On Segmentation And High - Efficiency Itemsets For Data Flow

Posted on:2016-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2208330470452900Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Mining high utility itemsets has emerged as one of the most significant research issues in data mining. It aims to obtain high utility itemsets, and can meet more user requirements compared with the traditional frequent itemsets. However, the existing high utility itemsets mining considers the profits of items to be constants and independent, without considering the commodity price fluctuations, discount strategy, seasonal changes and other factors. In this paper, we extend the existing data model on the basis of the existing high utility itemsets mining, and propose a new itemsets mining pattern, namely "high utility itemsets mining with discount strategies". For this new pattern, we improve and propose the batch algorithms and incremental algorithms. The main research content is as follows:(1)Proposed the "high utility itemsets mining with discount strategies", the model first introduced the goods discount strategies, and comprehensively considered the goods cost, merchandise sales, and discount strategy changes and the dataflow characteristics of continuity and liquidity in the different period, can actually mine the user interested high utility itemsets.(2) Revised the traditional Two-Phase algorithm and applied it to the high utility itemsets mining with discount strategies. First, let the original data to data fusion, and convert them to one table. Then mine all the high utility itemsets with Two-Phase algorithm. In the first stage, generate and test candidate itemsets level by level, find out all high transaction-weighted utilization itemsets; in the second stage, rescan the database, obtain the actual utility of high transaction-weighted utilization itemsets to decide whether it is high or not.(3)Proposed a batch algorithm based on transaction-index, called TIB. The algorithm used a new structure:Index-List, applied it to store the utility information and transaction index information of itemsets. Therefore, it needs not to repeat scanning the original database, only simple calculation for the index list of itemsets can get high utility itemsets, thereby significantly reduces the search time. According to the transaction-weighted down closure property, the algorithm only generates the index lists for those high transaction-weighted utilization itemsets, thus reduces the memory space.(4) Revised the traditional frequent itemsets incremental mining algorithm FUP and applied it to the high utility itemsets mining with discount strategies. The algorithm combines the thought of Two-Phase algorithm, and the mining process is divided into two steps. First, the algorithm retains the discovered information in the original database, and then based on the incremental database, find out all high transaction-weighted utilization itemsets from updated database by means of iteration step by step; second, scan the database, obtain all high utility itemsets from the high transaction-weighted utilization itemsets.(5) Proposed a transaction-index-based FUP improved algorithm, called IFUP. The algorithm combines the thoughts of Two-Phase, TIB and FUP, and use the index-list structure to store information. In the process of mining, the algorithm does not need to repeat scanning the original database and incremental database, simply scan the index lists can get the high utility itemsets. A large number of experiments show that the proposed new model is feasible and reliable, and the adjustment of discount strategies will affect the changes of high utility itemsets. In addition, in different dataset distribution and parameter settings, the efficiency of TIB algorithm is20-50times faster than that of Two-Phase algorithm; the FUP algorithm is about20times faster than TIB algorithm; the IFUP algorithm is about10times faster than FUP algorithm...
Keywords/Search Tags:Data mining, Utility mining, High utility itemsets, Discount strategies, Data stream
PDF Full Text Request
Related items