Font Size: a A A

The Research And Application Of Cloud Storage System Based On Hadoop

Posted on:2015-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:G J ChenFull Text:PDF
GTID:2308330473453354Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rise of cloud computing and software as a service(SaaS), cloud storage becomes a research hotspot in the field of information storage. Many existing cloud storage systems are built on distributed file system, while HDFS is mostly accepted by industry. Because MapReduce programming model can be used for data mining on HDFS to better find potential value. However, the current HDFS has single point of failure with NameNode, the number of backup nodes and other issues, thus affecting the high availability of cloud storage system. Moreover, the parallelization realize of association rule mining algorithm has room for further improvement and optimization. Therefore, to solve the above problems, this paper completes the following tasks:Through analyzing the relevant source code of Name Node in HDFS, and studying the work process and work mechanism, propose a HDFS high availability solution based on Heartbeat and AvatarNode, which has the characteristics of hot standby and automatic switching. Through analyzing the relevant source code of primary and standby AvatarNode, and studying the work process and work mechanism, propose SecondaryAvatarNode solution to add a lightweight backup node to further improve the high availability of cloud storage system.Based on CLOSET+ algorithm, do some optimizations of the PFP algorithm in terms of mining closed frequent itemset, including optimizing cluster grouping method to make transactions allocated with each group as much as possible to up to the average, so that to achieve load balancing within each parallel task; in each recursive process of mining, using different projection FP-Tree structure according to sparse or dense data sets, that is top-down and bottom-up, to accelerate efficiency of closed frequent itemset mining; proposing a way based on sliding window to filter local closed frequent itemset to get a complete closed frequent itemset.By building a Hadoop cluster experimental platform, verifying the implemention and effectiveness of high availability solution based on Heartbeat and AvatarNode, also and the SecondaryAvatarNode solution. Meanwhile, analysis the efficiency of the PFP algorithm in terms of mining closed frequent itemset after improvement and optimization based on CLOSET+ algorithm, to verify that it has good scalability based on Hadoop framework.
Keywords/Search Tags:HDFS, high available, MapReduce, closed frequent itemset
PDF Full Text Request
Related items