Font Size: a A A

The Design And Implementation Of Task Operator Framework Based On Map Reduce In Big Data Analysis Platform

Posted on:2018-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:H B LiFull Text:PDF
GTID:2348330533466787Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of cloud computing era,big data technology developes rapidly.Due to the features big data represents,which are huge volume,wide variety,fast processing speed and low density value,traditional techniques of data storage,extraction,tranformation and analysis are not applicable.Therefore,brand-new solutions of big data application technology are required.In recent years,the study of large data technology rises flourishly in numbers of areas,winning attentions of industry and academia,and leading future tides of China's informatization.Massive data now can be dug out high quality knowledge and information by utilizing research results of large-scale data storage,analysis and processing technology.At current stage,the major big data tool is Apache Hadoop,a distributed system infrastructure,around which are varieties of open source components,including HDFS,YARN and Zookeeper.The big data analysis platform built by using these components consists of data access,data storage,parallel computing and platform management.It provides basic functions,from data collection,data storage to data analysis and data visualization.The function of data analysis is to study large data characteristics under different scenes,and to develop a specific analysis application by using mature parallel computing framework,so as to achieve parallel processing of massive data.This paper presents a design and implementation of Task Operator Model based on MapReduce in the existing big data analysis platform.The model takes advantages of the structured feature of Avro serialization framework to form a framework for processing big data analysis requirement.The framework solves problems that traditional parallel computing frameworks lack of a flexible way of combination,not only leading to programming repetition and extra application maintenance costs,but also not being optimized effectively for structured data sources.Using Task Operator Framework for big data analysis can make programming complexity and data coupling lower,while module reusability higher.According to the actual demands,users generate execution process of analysis application by combing multiple types of Task Operators,so as to achieve effects of no or little programming,efficient analyzing and fast building of application.
Keywords/Search Tags:big data analysis, task operator, MapReduce, Avro
PDF Full Text Request
Related items