Font Size: a A A

Research On Hadoop Performance Optimization Based On Docker Technology

Posted on:2019-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:X FengFull Text:PDF
GTID:2428330566999341Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer information technology in the 21 st century,the amount of data produced has been increasing exponentially.As an open source and efficient architecture platform for cloud computing,Hadoop's ability to process data quickly becomes the focus of attention.Hadoop still has a lot of room for improvement in data processing performance.Therefore,the Hadoop performance bottleneck and security are analyzed from different angles,and corresponding optimization schemes are proposed to improve its performance,which is of great significance in the era of big data.This paper analyzes the limitations of deploying hadoop clusters on traditional servers in Docker containers and Hadoop clusters,and compares the advantages and disadvantages of traditional virtualization technology and Docker technology in processing data performance.For these advantages and disadvantages,it puts forward the Hadoop performance optimization method based on the Docker technology.This method builds a Hadoop cluster to optimize Hadoop platform performance by using Docker container technology and fully integrating the existing hardware resources.This approach optimizes Hadoop performance by analyzing memory configuration parameters and maximizes performance by setting the optimal level of concurrency through YARN cluster management.At the same time,this paper studies the security aspects of Hadoop data in heterogeneous environments and proposes a new data distribution scheme.This scheme improves the security of data storage in Hadoop heterogeneous systems by using the secret sharing technology.According to the above problem,this paper has set up a Hadoop test platform based on Docker container.It compares the task execution time of the improved and unimproved solutions,the CPU usage is used as an indicator to evaluate the optimization scheme to the system.The test results show that compared with the default Hadoop configuration parameters,the CPU usage of this article is reduced from 96.7% to 53%,and the data security of Hadoop is improved in heterogeneous environments.
Keywords/Search Tags:Hadoop, MapReduce, Docker, memory configuration, performance optimization
PDF Full Text Request
Related items