Font Size: a A A

Research And Application Of High Utility Pattern Mining Algorithm

Posted on:2018-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:X Z LuoFull Text:PDF
GTID:2348330518982362Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of technology, more and more data are produced. It has been the most concerned issue in the academic and industry that how to dig out valuable or interesting information from the complex data. Frequent pattern mining, with the ability to dig out the internal relationship of items, has been widely used in gene analysis, text classification, tumor diagnosis, image processing and other fields. However, the items in the frequent pattern algorithm have only two states in the transaction: presence or absence. High utility pattern algorithms were proposed to solve the problem that frequent pattern algorithm only concerns whether the item exists in the transaction and ignores the utility of the item itself. However, there are still a lot of shortcomings in the current high utility pattern algorithms, the main problems are: (1) those algorithms are not scalable; (2)those algorithms lack the effective compression of the transactions information; (3) those algorithms have low timeliness. To solve these problems, this paper proposes a high utility pattern mining algorithm based on static database and an incremental high utility pattern mining algorithm based on dynamic growth database. Experiments show that the two algorithms proposed in this paper are good. At the same time, this paper presents the practical application of the high utility pattern mining algorithm in micro-blog friend recommendation. The main work of this paper are:1. Proposed a high utility algorithm based on projection in the static database, called as HUPMP. HUPMP is one-phase algorithm, and without candidate generation in the process of high utility pattern mining.2. Proposed an incremental high utility pattern mining algorithm to solve the dynamic growth of the database problem, called as IHUP. IHUP is proposed on the basis of HUPMP, and making full use of the scalability of the HUPMP.3. Proposed two valid structures HUP-Array and HUP-Result to store transactions information and high utility patterns respectively. The HUP-Array structure is used to compress the transactions, and the scalability of the algorithm is greatly improved.HUP-Result structure can quickly search and update the result set, so that the algorithm has a higher response speed in the business scenarios of dynamic incremental database.4. Four strategies were developed to improve the time performance and reduce the memory consumption. Items are sorted by the decending order of support to greatly merge and effectively compress the transactions to save memory. The strategy of the sum value of the prefixe items can reduce the time complexity of HUP-Array in comparison from O(M×N) to O(1). The tight-TWU strategy can quickly reduce the low utility pattern and improve the performance of the algorithm. The strategy of dealing only with the items in the new transactions can be used to deal with the requirement of the high response to the incremental high utility pattern mining algorithm.5. The validity of HUPMP and IHUP is verified by lot experiments. Simulation experiments with several excellent algorithms in different databases show that HUPMP and IHUP proposed in this paper have good performance in static database and dynamic growth database respectively.6. In this paper, the practical application of high utility pattern mining algorithm in micro-blog friend recommendation is given. Through the application of high utility pattern mining in two different scenarios of static database and dynamic growth database,the flexibility and scalability of the high utility pattern mining algorithm in the actual recommendation system are illustrated.
Keywords/Search Tags:High Utility, Frequent Pattern, Incremental High Utility, High Utility Pattern, Data Mining
PDF Full Text Request
Related items