Font Size: a A A

Research Of Closed Frequent Item Sets Mining On Distributed Environment

Posted on:2015-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:J XuFull Text:PDF
GTID:2298330431981027Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
People now have much more and more data after the entering of the big data time, but how to manage and use the data is an emergence problem to be solved. One of the attributes of big data is that its capacity is very large.Some times the capacity is so large that even a data center can not hold all of it. So the distributed technology is one of the best solutions to deal with big data.There are a lot of distributed solutions proposed such as:Grid compute, Cluster, Cloud compute and so on. The distributed storage technology solves the problem of how to manage big data, but what’s more we should get useful information from huge data.Data mining is a science to dig useful knowledge from large data, so it is also called knowledge discover.At the beginning of data mining research focus on how to deal with data in a single platform.But at now the size of data are increasing exponentially and a lot of data are stored in different places, so many researchers try to develop new data mining algorithms that can running efficiently in distributed environment.Closed frequent item sets mining is an important branch in data mining and also an inescapable step in many data mining algorithm.But the process of closed frequent data mining needs a lot of compute source, a lot of people research how to use the huge compute ability of distributed system to deal with closed frequent item sets mining. This page research on the distributed closed frequent item sets mining, proposed some new distributed closed frequent item sets mining algorithms that can run efficiently on distributed environment.Frequent pattern tree is firstly used in the process of frequent item sets mining algorithm, it can store the relation between item sets. This page used a new data structure called vertical frequent pattern tree which is derived from frequent pattern tree by partition it vertically. The third chapter of this page research on a distributed closed frequent item sets mining algorithm based of vertical frequent pattern tree called DVFP algorithm. DVFP mining closed frequent item sets by data parallel and task parallel. A new data sequential method is used in DVFP to reduce the communication cost.Mining data in an incrementally way can deal with some dynamic data set more flexibly.The fourth chapter proposed an incrementally closed frequent item sets mining algorithm. This algorithm mining closed frequent item sets by traversing a data structure derived from prefix tree called shadow prefix tree. This algorithm can find the respond node rapidly by the link without store too much duplicate information.This algorithm also avoid the subset testing,so its speed is quicker than some others.Heterogeneous computing can make the full use of CPU and GPU, to realize parallel high speed computing.The fifth chapter of this page proposed a parallel closed frequent item sets mining based on improved vertical data structure. Vertical data structure is an important technology in closed frequent item sets mining which can find the closed frequent item sets by the operation of "and" or "or" between different item sets.But vertical structure will waste the memory space and can not deal with large data sets. The fifth chapter proposed a new vertical structure to solve the problem. The improved vertical structure compresses the storage space and use the memory much more efficiently.A new distributed closed frequent item sets mining algorithm which can run on a heterogeneous environment is developed based on the new vertical data structure. With the acceleration of GPU this algorithm can deal with large data set more quickly.
Keywords/Search Tags:Distributed computing, closed frequent item sets mining, mass data processing, GPU computing
PDF Full Text Request
Related items