Font Size: a A A

Multi-dimension Of Temporal Data Mining Model Based On Hadoop Platform

Posted on:2017-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhangFull Text:PDF
GTID:2308330482480663Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the further development of information technology and Internet technology, the global data show explosive growth. Large data sets not only has the time characteristic, along with the social life and production data is increasingly multifarious, also has the obvious multi-dimensional attribute. In the real world, how to reflect the connection between the things and things, the different attributes of things, and the different properties of different things, to excavate the valid, novel, potentially useful patterns and rules, has important research significance.Data mining is a method and technology to extract the features of internal rules from a large, noisy data. The traditional storage system and data mining model are not qualified for the large scale of multi-dimensional temporal data. Cloud computing technology, especially the Hadoop platform with its advantage of strong scalability, cost-effective and good fault-tolerance solution for the large data mining is presented. Based on Hadoop cloud computing platform, large scale multi-dimensional temporal data mining is to conduct research of this paper.First of all, from the perspective of the temporal data, constructing SDTE model. Summarized the concept of time correlation, centering on the time features, discussed the multidimensional nature of the temporal data. From the perspective of the real world and the database system, combining the numerical sequence of temporal data, transactional sequences and event sequence mining research, proposed and established a unified, standardized SDTE model.Secondly, combined the technology of large scale data mining and Hadoop platform, architecture large scale multi-dimensional model of temporal data mining. Super model uses the distributed file system file data storage and fault tolerance, at the same time, the use of graphs programming model for parallel computing. The top-down hierarchical divided into dynamic interaction layer, application layer, data mining and distributed platform layer. Hive-HBase integration model is proposed to operate the HDFS, universal parallel programming model is constructed.Then, based on the model improved the FP-Growth algorithm and experiment. Based on the study of the concept of multi-dimensional association rules in the FP-Growth algorithm, FPCpbGrowth algorithm is proposed. And FPCpb algorithm parallelization.In the end, build the experiment environment to analyze the data, and the feasibility of Hadoop data mining model and the efficiency of FPCpb-Growth algorithm are verified.
Keywords/Search Tags:Multi-dimension of Temporal Data Mining, HDFS, FPCpb-Growth Algorithm, Parallel Programming
PDF Full Text Request
Related items