Font Size: a A A

Distributed Data Processing System Design And Realization Of The Compute Nodes

Posted on:2013-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhaoFull Text:PDF
GTID:2248330374485270Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, all kinds of large enterprises and Internetcompanies are more and more diverse forms of processing of the data. Under normalcircumstances, operators need to process and analyze huge amounts of data efficiently.Due to the huge amount of data, the calculation is done only by a single computer orwithin an acceptable time is impossible. We design and implement massive dataprocessing system based on distributed computing for massive data processingapplications. This thesis evolved this research mainly aimed at the design andimplementation of distributed computing node which is the sub node of the system.Theaim of distributed massive data processing system is processing huge amounts of data,namely the efficient and highly reliable data analysis and processing mechanism. Theirthoughts is to draw on the basic idea of Map/Reduce, distributed the huge amounts ofdata in structured form to multiple data nodes, and then according to a series ofcalculations, analyze and query data which is extracted from multiple data nodes. Thisthesis focuses on the compute nodes thought and technology, proposal a multi-levelcommunications processing framework, and design a dynamic classification mergequery strategy in this framework to achieve a high efficiency and high reliabilitydistributed computing system.The main work of this thesis is shown as follows:Firstly, the infrastructure of mass distributed data processing system. Study andcompare existing distributed computing technologies,proposal the basic structure of theframework of massive data processing system which is composed of the master nodeand the node of the computing resource pool face to telecommunications data. Thearchitecture meets the requirements of telecommunications massive distributed datacomputing. Avoid the unreliability and uncontrollability problem of the centralizedsystem or pure distributed systems.Secondly, design and implement the subsystem compute node. Design andimplement a multi-level communication framework structure of the compute node ofthe mass distributed data processing system, so that it has high performance and high scalability, implement business and platform independent, and proposal practicalapplication of distributed computing capabilities for the telecommunications largeamounts of data (data collection, aggregation, query).Thirdly, design and implement query task mechanism of compute node. Throughthe analysis of Map/Reduce framework, implement a dynamic classification mergequery strategy for real-time data structure query in computer node. It will make thesystem real-time query mechanism more efficient.Finally, make the function and performance test for computer node of massivedistributed data processing system. Test shows the effectiveness and efficiency ofcompute node.
Keywords/Search Tags:Distributed computing, Reliability, Platform, Dynamic classification mergequery
PDF Full Text Request
Related items