Font Size: a A A

A Hybrid System Based On Cost Model For Combining Hadoop And Storm

Posted on:2015-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:P LiuFull Text:PDF
GTID:2308330473953342Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Our world is stepping into an era where data size grows rapidly. The mainstream big data technologies can be divided into four categories: batch computing, real-time computing, stream computing and uniform resource management platform. Each computing framework has its own applicable area. Once user’s demand exceeds the area that a framework works for, this framework doesn’t perform well. For example, MapReduce is suitable for batch computing and has high throughout. While its long response time limits usage scenario of MapReduce. Similarly, Storm can give user a quick response for an event, but it suffers low throughout. For these problems, my solution is to combine some frameworks into a system.The aim of this research is to design a hybrid system to integrate Hadoop and Storm to improve the overall execution efficiency. The specific method is to use Storm to process real-time data and small-scale data while large-scale historical data is feed to Hadoop in order to increasing system throughput. Most of other similar systems are aimed at blocking the underlying implementation details and providing users with a unified query interface. Compared to these systems, the solution of this paper has two advantages: 1.design an auto framework-picking algorithm which can smartly assign task to proper framework. 2. Setting cache table for Storm in HBase to improve its performance.The implementation of prototype system includes three layers: query language, taskscheduling layer and task execution layer. The frame-picking algorithm becomes a part of query language. According to performance tests, hybrid system can meet our expectations. Single statement execution time is close to SummingbirdOnHadoop and is 20%~40% faster to SummingbirdOnStorm. And for large-scale data, hybrid system will choose Hadoop as execution framework. As a result its throughout has been highly improved 40% than SummingbirdOnStorm in this scenario. Finally this new architecture has three advantages below: 1) Support a wide range of application scenarios because of combining Hadoop and Storm. 2) Provide users with unified query language and blocks the implementation details. 3) Choose proper framework smartly.
Keywords/Search Tags:Hadoop, Storm, Hybrid System
PDF Full Text Request
Related items