Font Size: a A A

Design And Implementation Of Master HA In The Project Of Spark On Ego

Posted on:2018-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:X C DongFull Text:PDF
GTID:2348330521951525Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the advent of big data era,the internet and physical devices generate a lot of data every day.At the same time,users put forward a new demand how data analysis tool can dig out valuable information from massive data within an acceptable time.Then,hadoop,spark and other large data calculation framework appeared in people's eye and spark basically meet people's performance requirements.Many companies join the open source spark community,and they hope to establish a large impact on the big data field.Ibm is one of the many companies and propagandizes that spark is the major important thing for ibm in the next decade and 3500 developers will be arranged in spark-related project.Spark on ego team is one of the teams,we have the depth of the spark customization,which integrated the resource scheduling frame ego,so that spark has better performance and functionality in terms of scheduling resources.Meanwhile,we have developed more features to provide users with better service,such as hierarchy.Spark master HA feature based on ego is a submodule of the spark on ego project and the overall goal is to achieve the high reliability of master node.When the master process stopped,it can recover from the failure state to normal state,continue to provide services for the cluster.The Spark on ego project itself is a custom version of the open source spark version,and there are many features that are directly integrated from open source spark.And the high-availability feature of the master node is actually realized in the open source spark.Therefore,we mainly refer to the realization of the high-availability feature of the master node in open source spark,combined with the difference between the spark on ego project and the open source spark,such as the different class structure of master,different task scheduling and resource allocation processes,to achieve our own spark master high availability based on ego.The workflow of spark master high availability feature based ego is in the following sections.It will perform different startup steps according to the user-configured fault recovery policy when the master node goes down.If the user is configured as zookeeper mode,which will select the new leader master node from the pre-configured backup master node and then enter the fault recovery process under the condition of master downtime.If the user is configured as fileSystem mode,which needs the uses to manually start the master process and then enter the fault recovery process.After the master process starts,respectively,it will obtain the information of the task scheduling and resource allocation from the metadata information stored in the external device,the resource manager ego side and the driver side.When all the required data is acquired,the master process will preferentially use the operations in the normal task scheduling and resource allocation process to synchronize the data making the data in the master process is consistent with the data in the master downtime process.Ensure that the master process can schedule the newly submitted application and continue to provide services for the cluster.
Keywords/Search Tags:Ibm, Ego Spark, Master, Zookeeper, HA
PDF Full Text Request
Related items