Font Size: a A A

Distributed High Reliability Control Node Design And Implementation Of Mass Data Processing System

Posted on:2013-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiFull Text:PDF
GTID:2248330374985893Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the furtherance of telecoms business in the livelihood of the people domain,the scale of traditional business and emerging business are continually expanding,telecoms provider is under huge pressure, primarily including: Mass telecoms raw datareal-time acquisition; data analysis and summary in the Multi-Task Context;real timemass telecoms structured data query in a multiuser environment;how to keep datasecure.All of these problems can not be solved by centralized server.As aconsequence,the Distributed Mass Data Processing System was designed to meet theneeds of telecoms provider by using distributed technologies.The thesis elaborates the design and implementation of central control node inDMDPS,and presents new solution against the problem of balance strategy in newenvironment.This paper finishes the following works:(1) In isolated developmentMapReduce calculating architecture,the paper implements a framework for task andsubtask scheduling.With two-level scheduling strategy,the system manages these tasksand promotes the state machine of them efficiently.(2)Through scheduling the wholesystem effectively,the operations such as data collection,data analysis andsummary,data query in real-time and data backup and recovery were performed.Usingbinary tree dynamic merge algorithm,the paper implents real-time querying for massstructed data.(3)Using tasks synchronization between main and host servers,highreliability and high availability of the centrol control node wererealization.Meanwhile,at the point of load balancing and task scheduling,this paperpresents some new algorithms for the problem of deployment environment. First, for theDMDPS introducing virtualization technology, to the problem of thrashing causing byheterogeneous environments, the paper presents a new dynamic predeterminationalgorithm on the basis of relative capability measure.Second,for the mapping betweencomputing units and database cluster,through waiting strategy,reduce the relativedistance between tasks and data, improving throughput of the total system.On the design and implementation of central control node,the author doesfunctional test performance test rigorously.By functional test, the system functions such as data collection,data analysis and summary,data query in real-time and data backupand recovery were implemented correctly. Through performance test, Experimentalresults show that the efficiency of DPST in prevent thrashing is quiet visible.Thethroughput was improved while task finish time was reduced.
Keywords/Search Tags:Distruibuted, MapReduce, Virtualization, Load Balancing, Task Scheduling
PDF Full Text Request
Related items