Font Size: a A A

Research On Real-Time Stream Processing Platform In The Cloud

Posted on:2015-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2298330467463290Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing, more and more enterprises and individuals choose to deploy their applications into the cloud. Among them, there are an increasingly high proportion of real-time stream processing applications. There is a big difference between the traditional batch processing platforms in the cloud (such as hadoop) from real-time stream processing platform in the cloud. The inputs of a batch processing job are static data which stored in advance, so the amount of the inputs is predictable and the job ends when all the data are processed. The inputs of a real-time stream processing job are consecutive stream data, so the scale of a job is unknown and the stream traffic is fluctuant. Therefore, how to design a universal and reliable real-time stream processing platform has become a vital issue. The current stream processing platform has some shortages on programming model, reliability, dynamic changing of resources and applications, dynamic load balancing and cluster monitoring. So it’s difficult to satisfy the changing needs of stream processing applications.This paper focuses on the core technology of real-time stream processing platform in the cloud, proposed a series of research and implemented a distributed platform. First of all, we designed a reliable architecture based on zookeeper. The architecture ensures that all the failure of both programs and nodes can be detected and recovered in time by means of heartbeats monitoring and task migration policies. Secondly, we designed a loose user interface, which can integrate both dynamic link library files and executable files, to make it easy and flexible to run applications on our platform. Thirdly, we proposed a combined method of state-level scheduler and distributed session table to solve the dynamic load balancing problem from tasks to tasks. We overcame the difficulties of keeping session consistency when global session table is unavailable and proved the convergence of the method. Then, we established a task allocation matrix to optimize the task scheduling model. By using this allocation matrix, user can formulate personalized scheduling policy. Last but not least, in order to keep consistency of distributed clusters and solve the problem of the mutual backup of master nodes, we used distributed lock service based on zookeeper.Based on the research mentioned above, we implement a real-time stream processing platform which has the feature of high-availability, low latency, high scalability and easy to integrate. The platform has functions of loose user interface, job auto-deployment and updating, dynamic load balancing, fault tolerance, flexible task scheduling policy, resource and application dynamic changing, and visible cluster monitoring. By using this platform, users can get rid of cluster establishing, communication implementation and platform maintenance and focus on the stream processing applications. Therefore, users can shorten the development cycle of application and reduce the costs of maintenance. Experimental results show that the throughputs and processing latency of our platform are at the advanced level of the similar platforms. This paper provides reliable, universal and easy solutions to real-time stream processing such as e-commerce, internet of things and internet traffic monitoring, etc.
Keywords/Search Tags:stream processing platform, distributed computing, high-availability, load balancing, global monitoring
PDF Full Text Request
Related items