Design And Implementation Of Spark Platform For Big Data Streaming Computing Based On Kubernetes

Posted on:2018-09-27

Degree:Master

Type:Thesis

Country:China

Candidate:W K Du

Full Text:PDF

GTID:2348330536979933

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Nowadays,the cloud platform is based on the traditional virtual machine technology(VM)to achieve the hardware resources management and flexible scalability.There is a greater resource overhead on the speed of starting and stopping,resource utilization,operational monitoring and performance.The big data computing framework deployed in the cloud platform is a typical application scenario.With the rapid growth of the amount of data,the traditional cloud platform architecture and processing methods can not effectively adapt to the big data processing environment.With the advent of lightweight container technology,Docker Container provides developers with the ability to rapidly build,deploy and migrate distributed applications,and greatly simplifies the deployment process and reduces the server costs.Kubernetes is an open source system for automating the deployment and management of large-scale Docker container applications.It provides the scheduling of resources,automatic deployment,service discovery,and flexible scaling for the containerized applications,and it offers support for the big data distributed computing frameworks Map-Reduce.Of course,Docker is deficiency in security,storage and other aspects,and it is still in the stage of rapid development for the cloud platform.This paper focuses on the realization of the deployment of Spark distributed computing framework based on Docker containers,with the virtualization container Docker as the lowwer bearing platform and the Kubernetes as a container management and scheduling system.The containerized big data platform can greatly improve resource utilization and computational parallelism,simplify operation and maintenance management costs,and be able to automatically scales the Spark computing nodes according to the real-time load.For the deployment of Spark clusters based on Kubernetes,the main research of this paper is as follows:(1)To realize the communication between the Docker container of different host.Docker itself does not have the communication capabilities between the host computer.The use of flannel build an overlay network to achieve the communication capabilities between the different host computer container.(2)To design and implementation of the Spark cluster based on the Kubernetes system.This paper analyzes the communication mechanism of Spark cluster,constructs Spark image using dockerfile,designs and implements Spark cluster based on kubernetes system,which can rapidlydeploy and extend Spark cluster.(3)To automatically scale the number of Spark node based on the load.Docker container resource monitoring is used to collect observed container resource utilization on each Node.(4)The platform was deployed and tested.Experiments show that the use of Docker container to build Spark framework,can improve resource utilization,simplify the operation and maintenance process,verify the feasibility and effectiveness of the system.

Keywords/Search Tags:

cloud computing, Docker, Kubernetes, Spark, Autoscaling

PDF Full Text Request

Related items

1	Design And Implementation Of A Container Cloud Platform Based On Docker And Kubernetes
2	Design And Implementation Of Container Cloud Platform Based On Kubernetes
3	Adaptive Autoscaling On Kubernetes:optimization And Implementation
4	Design And Implementation Of Container Cloud Platform Base On Kubernetes Technology
5	Design And Implementation Of Container Cloud Platform Based On Kubernetes And Docker
6	Research And Implementation Of Image Combination Based On Docker
7	Research And Implementation Of Container Resource Management System Based On Kubernetes
8	Design And Implementation Of Kubernetes Man-hour Management Sysgtem Based On Google Cloud
9	The Internal Integration Design And Implementation Of PaaS Platform System Based On Kubernetes
10	Research And Implementation Of Dynamic Resource Scheduling Based On Kubernetes