Font Size: a A A

Mining High-Utility Itemsets Under Various Data Types, Constraints And Applications

Posted on:2017-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:W S GanFull Text:PDF
GTID:2348330503487057Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technologies, the relationship between data science and numerous domains becomes increasingly relevant. Highutility itemset mining(HUIM) considers both profit and quantity of each item in a set of transactions, thus revealing the high profitable knowledge for helping people in business decision-making and enterprise management, which has become an important issue in recent years. The existing algorithms of HUIM mainly concentrate on handling precise data, while there are different data types in real-world applications. Besides, the mining constraint is not always the same. Many existing algorithms cannot derive the required information effectively. Therefore, this dissertation focuses on studying the related issues of HUIM in three aspects as data type(data level), constraint conditions(model level), and real-world applications(application level). The main research contents and contributions of this dissertation are as follows:Firstly, for HUIM on various data types, this dissertation creatively proposes a novel framework and two algorithms for mining HUIs from uncertain data. Instead of the previous works for mining HUIs from precise data, the inherent relationship between utility and uncertainty is first analyzed and the potential high-utility itemset mining framework from the tuple-uncertain data is presented. The upper-bound-based PHUI-UP algorithm and probability-utility-list-based PHUI-List algorithm are further developed; the second algorithm outperforms the first approach. They provide a new direction of HUIM and the scope of HUIM can be further extended.Secondly, for HUIM under various constraints, a high-utility itemset mining framework under multiple minimum utility thresholds(HUIM-MMU) is proposed. All the previous works mining HUIs with a uniform minimum utility threshold, the HUIMMU algorithm and two improved HUI-MMUTID and HUI-MMUTE algorithms are stated to mine the HUIs with multiple minimum utility thresholds. Besides, the developed sorted downward closure property and least minimum utility value guarantee the correctness and completeness of the discovered results.Thirdly, three frameworks to respectively handle the practical applications with dynamic database for record insertion, record deletion, and record modification are presented. The utility-list-based HUI-list-INS algorithm and HUI-list-DEL algorithm, and the PRE-HUI-MOD algorithm with the pre-large maintenance strategy are respectively developed to efficiently maintain and update the discovered HUIs without rescanning the entire database each time. For handling the three situations in the dynamic databases, experiments showed that the mining performance of the designed models in terms of runtime, memory usage and derived patterns, are significantly outperform the state-of-the-art batch-model and the previous maintenance algorithms.Overall, for the practical applications, this dissertation combines the basic theoretical exploration and experimental verification, deepens experimental verification. It puts forward a series of new theories and technologies, and expands the research scope and theoretical level of HUIM in three levels such as data level, model constraint level, and application level.
Keywords/Search Tags:data mining, high-utility itemset, uncertain data, constraint condition, dynamic database
PDF Full Text Request
Related items