Multi-dimension Of Temporal Data Mining Model Based On Hadoop Platform

Posted on:2017-02-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Zhang

Full Text:PDF

GTID:2308330482480663

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the further development of information technology and Internet technology, the global data show explosive growth. Large data sets not only has the time characteristic, along with the social life and production data is increasingly multifarious, also has the obvious multi-dimensional attribute. In the real world, how to reflect the connection between the things and things, the different attributes of things, and the different properties of different things, to excavate the valid, novel, potentially useful patterns and rules, has important research significance.Data mining is a method and technology to extract the features of internal rules from a large, noisy data. The traditional storage system and data mining model are not qualified for the large scale of multi-dimensional temporal data. Cloud computing technology, especially the Hadoop platform with its advantage of strong scalability, cost-effective and good fault-tolerance solution for the large data mining is presented. Based on Hadoop cloud computing platform, large scale multi-dimensional temporal data mining is to conduct research of this paper.First of all, from the perspective of the temporal data, constructing SDTE model. Summarized the concept of time correlation, centering on the time features, discussed the multidimensional nature of the temporal data. From the perspective of the real world and the database system, combining the numerical sequence of temporal data, transactional sequences and event sequence mining research, proposed and established a unified, standardized SDTE model.Secondly, combined the technology of large scale data mining and Hadoop platform, architecture large scale multi-dimensional model of temporal data mining. Super model uses the distributed file system file data storage and fault tolerance, at the same time, the use of graphs programming model for parallel computing. The top-down hierarchical divided into dynamic interaction layer, application layer, data mining and distributed platform layer. Hive-HBase integration model is proposed to operate the HDFS, universal parallel programming model is constructed.Then, based on the model improved the FP-Growth algorithm and experiment. Based on the study of the concept of multi-dimensional association rules in the FP-Growth algorithm, FPCpbGrowth algorithm is proposed. And FPCpb algorithm parallelization.In the end, build the experiment environment to analyze the data, and the feasibility of Hadoop data mining model and the efficiency of FPCpb-Growth algorithm are verified.

Keywords/Search Tags:

Multi-dimension of Temporal Data Mining, HDFS, FPCpb-Growth Algorithm, Parallel Programming

PDF Full Text Request

Related items

1	Algorithm Design And Implementation Of Multi-core Parallel Association Rule Mining Environment
2	Research On Association Rules Mining Methods Of Mass Engineering Data Based On Hadoop
3	Research On Parallel Data Mining Algorithm Based On Hadoop
4	Research Of Textual Periodicity Data Mining In Temporal Data
5	Research And Design Of Data Mining System For Tcm Disease Based On Cloud Computing Environment
6	Multi-dimensional Multi-layer Data Mining Algorithm Mpfp Design And Its Application
7	Research Of Visualization Of Multi-dimension Data
8	Parallel Frequent Itemset Mining Optimization Algorithm Based On Spark
9	Study On Multi-dimension Of Temporal Association Rules Mining
10	Research Of Spatio-temporal Data Mining Algorithm Based On FP-tree