Font Size: a A A

Design And Implementation Of Spark Platform For Big Data Streaming Computing Based On Kubernetes

Posted on:2018-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:W K DuFull Text:PDF
GTID:2348330536979933Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,the cloud platform is based on the traditional virtual machine technology(VM)to achieve the hardware resources management and flexible scalability.There is a greater resource overhead on the speed of starting and stopping,resource utilization,operational monitoring and performance.The big data computing framework deployed in the cloud platform is a typical application scenario.With the rapid growth of the amount of data,the traditional cloud platform architecture and processing methods can not effectively adapt to the big data processing environment.With the advent of lightweight container technology,Docker Container provides developers with the ability to rapidly build,deploy and migrate distributed applications,and greatly simplifies the deployment process and reduces the server costs.Kubernetes is an open source system for automating the deployment and management of large-scale Docker container applications.It provides the scheduling of resources,automatic deployment,service discovery,and flexible scaling for the containerized applications,and it offers support for the big data distributed computing frameworks Map-Reduce.Of course,Docker is deficiency in security,storage and other aspects,and it is still in the stage of rapid development for the cloud platform.This paper focuses on the realization of the deployment of Spark distributed computing framework based on Docker containers,with the virtualization container Docker as the lowwer bearing platform and the Kubernetes as a container management and scheduling system.The containerized big data platform can greatly improve resource utilization and computational parallelism,simplify operation and maintenance management costs,and be able to automatically scales the Spark computing nodes according to the real-time load.For the deployment of Spark clusters based on Kubernetes,the main research of this paper is as follows:(1)To realize the communication between the Docker container of different host.Docker itself does not have the communication capabilities between the host computer.The use of flannel build an overlay network to achieve the communication capabilities between the different host computer container.(2)To design and implementation of the Spark cluster based on the Kubernetes system.This paper analyzes the communication mechanism of Spark cluster,constructs Spark image using dockerfile,designs and implements Spark cluster based on kubernetes system,which can rapidlydeploy and extend Spark cluster.(3)To automatically scale the number of Spark node based on the load.Docker container resource monitoring is used to collect observed container resource utilization on each Node.(4)The platform was deployed and tested.Experiments show that the use of Docker container to build Spark framework,can improve resource utilization,simplify the operation and maintenance process,verify the feasibility and effectiveness of the system.
Keywords/Search Tags:cloud computing, Docker, Kubernetes, Spark, Autoscaling
PDF Full Text Request
Related items