Font Size: a A A

Research And Implementation Of Switch‐level SSD Caching Supported Data Parallel Optimization Strategy

Posted on:2016-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:C L ZhouFull Text:PDF
GTID:2308330479479804Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of society, it is necessary to analyze various types of large amounts of data according to different customer demands. This requires a flexible and efficient platform for big data processing. As an open source parallel processing platform for big data, Hadoop is widely used in various fields. However, the mode of Hadoop and metadata storage affects the efficiency of the parallel processing of data. Hadoop parallel computing cause network traffic periodic outbreaks which lead to network congestion. Hadoop jobs depend on the master node, it will result in a huge burden on the master node in the case of high frequencies.There are several reasons will affect the Hadoop data-parallel processing efficiency, such as dealing with small files, master node with high memory usage and other issues.In this study, we aim to optimize Hadoop data parallel processing efficiency by the way of switching nodes cache. In order to achieve switching node cache, we propose and implement the concept of switch-level SSD. Switch-level SSD works as switching equipment with intelligent data caching function by extending OpenFlow switches cache space with SSD. And an OpenFlow controller indicates Switch-level SSD forwarding messages and caching data. Thanks to parallel computing framework MapReduce and the distributed file system HDFS, Hadoop can be flexible and efficient in processing large amounts of data. In order to optimize the efficiency of parallel data processing Hadoop platform, we propose a Switch-level SSD caching supported data parallel optimization strategy. The strategy can optimize MapReduce and HDFS to improve the efficiency of Hadoop parallel processing. According to MapReduce, Switch-level SSD caching supported data parallel optimization strategy caches MapReduce calculation results to reduce Hadoop MapReduce cluster tasks, shorten request response time and ease the burden of master node; as for HDFS, Switch-level SSD caching supported data parallel optimization strategy solve small file problem by caching all the small files in SSD. In this way, Switch-level SSD can improve the efficiency of file system for small file getting and solve the problem about the memory footprint of master node. Experiments show that cooperating with the structure of Switch-level SSD, Switch-level SSD caching supported data parallel optimization strategy does reduce memory usage of Hadoop master node, and optimizes parallel computing framework MapReduce and HDFS file system. The strategy can enhance the efficiency of Hadoop data-parallel processing. And Switch-level SSD is similar with the traditional switch when works as a switching equipment, and it does not affect the Hadoop cluster of parallel data processing.
Keywords/Search Tags:Switch-level SSD, cache, Hadoop, data parallel
PDF Full Text Request
Related items