The Research Of In-memory Data Caching Technology In Map/Reduce-styled Massive Data Processing Platform

Posted on:2014-03-22

Degree:Master

Type:Thesis

Country:China

Candidate:G R Li

Full Text:PDF

GTID:2268330392473504

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Map/Reduce-Styled data processing platform is the cutting-edge technology inthe massive data processing field. Different from traditional data processing platform,Map/Reduce-styled platform is provided with new features of distributing data oncomputing nodes and scheduling tasks with data locality, which make it achievepredominant scalability.Good data accessing performance contributes much to guarantee the dataprocessing efficiency in Map/Reduce platform. The existing Map/Reduce platformstore data on disk-based distributed file system which leads to the poor data accessefficiency. In-memory Data Caching is the typical technology in data processing fieldto improve the data accessing efficiency and has been proved to be efficient in datacenter storage system. However, it is a blank space in the Map/Reduce platform.The main work in this paper focus on the in-memory data caching technology forthe open Map/Reduce platform, in which data reuse often occurs inter and intraapplications. Considering of the new feature of scheduling task with data locality, wepropose that the performance goal of the map/reduce-oriented data caching need toadapt from cache hit ratio to map/reduce parallel job execution efficiency. The maincontribution of this paper are followings:1) Considering the new feature of distributing data on computing nodes and therequirement of guaranteeing the data caching integrity for the computing task, theshared memory-based distributed cooperative data cache organization model ispresent. The file split is defined as the data caching granularity.2) The data cache replacement strategy is proposed to pursue the high ratio ofdata processing localization. Focus on the new feature that computing nodes andstorage nodes overlapping, two decision factors of the utilization of computing slotresources and the ratio of local access of file split are introduced in the data cachereplacement. Experimental results show that the average turnaround time ofmap/reduce jobs with the proposed cache replacement strategy has reduced by themaximum19.4%;3) Aiming on the issue that the amount of data with one-off access increases inmap/reduce platform, the data cache prefetching strategy is designed. The prefetchingstrategy choose the computing nodes with the soon-releasing slot resources as thelocation of prefetch data and notifies the task scheduling to deploy the correspondingtasks with the need of the prefetch data on those computing nodes, so as to achieve thedata processing localization. Experimental results show that, via introducing the prefetch strategy, the average execution time of tasks with one-off-access dataprocessing is decreased by the maximum of53.3%；4) Aiming ont the task scheduling efficiency, the data cache-aware taskscheduling strategy based on the FCFS (First Come First Service) scheduling strategyis present. It integrates the data cache scheduling with other resource schedulingseamlessly and schedule task with cache data locality and disk data locality;5) An In-memory data caching prototype system for Map/Reduce platform,called Dacoop, is proposed. Dacoop is designed based on Hadoop, which is an opensource Map/Reduce platform. Experimental results show that Dacoop outperformsHadoop on the average job turnaround time by the maximum of54.4%and theaverage of47.9%.

Keywords/Search Tags:

Massive Data Processing, Map/Reduce-Styled Platform, In-memory DataCaching

PDF Full Text Request

Related items

1	The Research Of Job Scheduling In Map/Reduce-styled Massive Data Processing Platform
2	Research On Data Placement Technology In Mapreduce-styled Data Processing Platform
3	The Research Of Job Scheduling Algorithm In Mapreduce-styled Massive Data Processing Platform
4	Research On Key Technology To Prediction-based Dynamic Memory Allocation On Map/Reduce Platform
5	The Research And Application Of Storage And Mining Methods For Massive In-Vehicle Information
6	Research On The Technology Of Massive Mobile Applications Data Processing
7	Research On Massive Spatial Data Distributed Storage And Parallel Processing Technology
8	Research Of Building Cloud Computing Platform For Processing And Analyzing Massive Data
9	The Research On The Current Situation And Development Analysis Of Life-service Styled Newspaper In Our Country
10	Research On Some Key Technologies Of Parallel Processing For Big Data Based On Map Reduce