Font Size: a A A

Research On Big Data Processing System Based On MapReduce Parallel Processing Framework

Posted on:2019-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z B LiFull Text:PDF
GTID:2428330548458949Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,The number of information collect terminals in human society shows a sharp increase with the progress of science and technology which can be represented by the computer technology.So people will inevitably face a huge number of data in daily life.How people utilize the massive data which is collected by the information collect terminals determines people's decision in daily work and life,which will no longer be based on experience or intuition,will be made according to the massive data.Big data processing system which meet the different business' s storage requirement and the data processing requirement by the different data structure and the different data processing algorithms is an important research content in big data technology.The existing achievements put forward different solutions which are about the storage and the processing of big data from different aspects,but there are still some deficiencies.For example,although the number of the data is huge,there will be a kind of property in the massive data which is from the same industry and the existing data processing system couldn't be optimized because of ignoring the feature between the massive data.1.In this paper,we build a cloud computing system which based on the Hadoop processing structure.This system evenly and reasonably distribute the massive data which is generated in people's daily life among the heterogeneous computer platforms through using the Map Reduce processing structure.This system using the Map function which is the interface of the Map Reduce structure to deal with the massive data which is generated from the information acquisition terminals.And according to the deal result from the Map function,the system finish the request of the data conventional treatment through the Reduce function which is also an interface from the Map Reduce structure.After the Map function and the Reduce function,this system finish the data processing task.The Map Reduce structure shield complex parallel process that do not need the operators care through the interface functions.At the same time,based on the Map Reduce data processing structure,this system optimize the storage performance to make it more reasonable and reliable.2.In view of the different kinds of data have unique features,the traditional memory-based Page Rank algorithm is introduced into the big data processing system which is proposed in this paper.In view of the disadvantages of multiple iterations and the massive communication between the different work platform when processing the graphics data and the high-dimensional data,we propose to sub-drawing the massive data to make the data iteration is calculating in the sub-graph and distinguish the internal and the external nodes to avoid the communication occurs to the whole massive data.Meanwhile,we improve the efficiency of the data processing system by introducing the Page Rank algorithm into the iterations of the graphic data and the high-dimensional data,as well as this system which based on the Map Reduce structure can be expanded to heterogeneous work platform.Based on the improved algorithm which is proposed in this paper,the requirement of the precious bandwidth resource in this data processing system will be in a reasonable range.3.In this paper,the Live Journal datasets and the Face Book datasets are used as experimental data sources.The Live Journal dataset contains 4847571 data nodes,68993773 data sides and it can be generated from www.livejournal.com.The Face Book dataset contains 957359 data nodes,161933115 data sides.All of the computers for experimenting are installed Ubuntu9.04,32-bit operating system,Java1.6 and Hadoop0.20.2.The experimental results show that the proposed big data processing system can improve the data processing speed and reduce the requirement of the communication bandwidth.
Keywords/Search Tags:Big Data Processing, Hadoop Data Processing Platform, MapReduce Data Processing Structure, Page Rank Algorithm
PDF Full Text Request
Related items