Font Size: a A A

Research On High Utility Pattern Mining Method For Big Data

Posted on:2017-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ZhangFull Text:PDF
GTID:2308330482990773Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of various sectors of the increasing emphasis on data and information technology, more comprehensive data generated while the amount of data is rapidly growing, and the industry also requested timely data has been generated by mining and analysis this makes efficient use of pattern mining techniques more important. Due to the large data it has a massive, real-time and dynamic characteristics, which requires mining algorithms have more time and space efficiency Gao. Although the mode of data mining technology has made some progress, but the efficiency of mining algorithm is still one of the focus of research in the field of data mining.In this paper efficient use pattern mining methods were studied. According to the characteristics of big data problem, and in large data mining algorithms typically faced, this paper presents a big data-oriented and efficient use pattern mining algorithm, which preclude the use of a sliding window method to maintain the data stream to be concerned about the current data and gives a form of data structure and a table structure to maintain the data in the current window so that the structure can be used to excavate the current window efficient undesignated set, but will not be lost under the influence of a data window of data integrity.High utility itemset mining addresses the limitations of frequent itemsetmining by introducing measures of interestingness that reflect the significance of an itemset beyond itsfrequency of occurrence. Among such algorithms, level-wise candidate generation-and-test approaches suffer from the drawbacks of having an immense candidate pool and requiring several database scans. Meanwhile, methods based on pattern growth tend to consume large amounts ofmemory to store conditional trees.We propose an efficient algorithm, called Index High Utility Itemsets Mine (IHUI-Mine), for application to high utility itemsets. The sub-sume index, which has been employed to mine frequent itemsets, is extended in IHUI-Mine to the discovery of high utility itemsets. In addition to the enumeration and search strategies inherited from the subsume index, we introduce a new property to specifically accelerate the computation of transaction-weighted utilization for high utility itemsets. Furthermore,given that bitmaps are used for database representation, the real utility of candidates can be verified from the recorded transactions rather than by resorting to the entire database. The computational complexity of IHUI-Mine is analyzed, and tests conducted on publicly avail-able synthetic and real datasets further demonstrate that the proposed algorithm outperforms existing state-of-the-art algorithms.
Keywords/Search Tags:Big Data, Hadoop, MapReduce FrameWork, Frequent Pattern Mining, High Utility Itemset
PDF Full Text Request
Related items