Distributed Data Processing System Configuration And Task Management Module Design And Implementation

Posted on:2013-12-20

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Peng

Full Text:PDF

GTID:2248330374486400

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Internet is developing in unbelievable speed and become an integral part of dailylife, seizing the market of most traditional industries. With the rapid development andincreasing of users, data quantity is expanding in exponential speed. Under the pressureof accustomed data quantity in internet industry of TB or even PB level, traditionalsingle-node data processing strategy is quite difficult to take. Under circumstance likethat, distributed data processing strategy is developed, and quickly became themainstream of data processing solutions.Our design implemented a model which store and distribute global configurationdata, and manage unsplited tasks. Any configuration data in this system is stored andpreserved by administrator through our model. All tasks are triggered here and recycledhere, too.While the entire system is in a process of initializing, our model distributedconfiguration data to other models so that they can start initialize successfully,meanwhile, if the configuration data is modified, data of new version will be pushed tomodels which focus on them, so configuration data of all models are latest. Unifiedcentralized management of configuration data guarantees data with same contents butseparated-stored come from the same source, which avoid runtime or initializationexceptions caused by the inconsistence of configuration data.All tasks are generated and triggered by our model. For tasks like data off-lineanalysis, structuring, re-organization and backup, administrator draw up correspondingexecution plan and our model will execute the plans using timers or monitors the systemrunning status and trigger tasks that should be executed at the specific situation. For realtime query, re-organization tasks, administrator could set parameters here and trigger thetask directly. A task log will be saved to record the execution situation after a task isfinished, then release resources applied by the finished task. If it’s a query task, queryresult data will be cached to avoid unnecessary stress is put on the system. In order toavoid data incompleteness caused by task missing under extreme cases, we scan the tasklog regularly to find out which tasks are missing and re-trigger them to guarantee the data completeness of the system.To prevent our system from node failure, we backup task running situations toremote database by dual redundancy and cold backup, so that tasks won’t bere-executed and resources will definitely be released.

Keywords/Search Tags:

MapReduce, distributed system, system configuration, task management

PDF Full Text Request

Related items

1	The Design And Realization Of Task-based Configuration Management System
2	Design And Implementation Of Configuration Management System In Distributed Network Systems
3	Task-Based Multilevel Software Configuration Management
4	Optimizing Mechanism Of Mass Video Transcoding System Based On Mapreduce
5	Design And Implementation Of Configuration System In Distributed Environment
6	MDC-Hadoop:Mapreduce Task Scheduling On Heterogeneous Geo-distributed Data Centers
7	Design Of Mapreduce Task Scheduling Algorithms In Heterogeneous Hadoop Cluster
8	The Research And Realization Of Resources Management And Task Scheduling Sub-System In Couple Distribute System
9	Research On Efficient Task Partition And Scheduling In MapReduce Data Processing System
10	Distributed Test System's Integrated Management Techniques