Font Size: a A A

Constraints Frequent Pattern Mining And Task Scheduling Under MapReduce

Posted on:2016-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:X W YanFull Text:PDF
GTID:2278330470964103Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Generating frequent pattern is a key step in mining association rule. Constrained frequent pattern refers to a frequent pattern which is generated with constrained conditions defined by users, and has characteristics of strong pertinence, high practicability and high mining efficiency, etc. With the increasing of data volume, there are problems of using too much memory and high I/O cost during constructing the constrained frequent pattern tree which is difficult to apply to massive and high dimensional data sets. In this thesis, constrained frequent pattern mining algorithm and task scheduling algorithm have been extensively researched with MapReduce programming model. The main research works can be shown as follows:(1) A parallel constrained frequent pattern mining algorithm PACFP is presented under the MapReduce programming model. Firstly, key steps of the algorithm, such as mapping transaction in data to frequent item support count, constructing the constrained frequent pattern tree(CFP-Tree), generating the constrained frequent pattern and aggregating frequent patterns etc, are realized by three pairs of Map and Reduce functions. Secondly, transference of data recording is implemented by using a data grouping strategy based on frequent item support, so that the problems of load balancing are effectively solved during generating the constrained frequent pattern. In the end, experimental results validate availability, scalability and expandability of the algorithm by using celestial spectrum data.(2) A redirect task scheduling algorithm is presented by thoroughly studies the task scheduling strategy. It can redirect a part of tasks which waiting in the heaviest load node to the lightest load node circularly after estimating the task execution cost and transfer cost. So the job completion time is shorten, the system resource consumption is reduce and the parallel degree is improved. Finally, the effectiveness, scalability and stability of the redirect task scheduling algorithm are validated experimentally by a test program, the parallel mining PACFP algorithm of constrained frequent pattern.
Keywords/Search Tags:Association Rule, Constrained Frequent Pattern, MapReduce, Support of Frequent Item, Load Balance, Cost Estimation, Task Scheduling
PDF Full Text Request
Related items