Research On Wide-area Data-intensive Computing Systems For Spatial Data Processing

Posted on:2014-03-10

Degree:Master

Type:Thesis

Country:China

Candidate:D Zhao

Full Text:PDF

GTID:2268330422460510

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The rapid growth of scientific data brings the fourth paradigm of scientific research.Compared with traditional compute-intensive scientific computing, data-intensivecomputing requires more consideration of data storage, throughput delay, loadscheduling etc. Therefore the implementation and platform for these are also differentfrom previous technologies used. Data-intensive computing attracts lots attention fromboth industrial and academic areas. The main sources of applications are from Internet,scientific computing, business intelligence, data mining and so on. In this paper, we usethe remote sensing image processing as the case study for big data science-orienteddata-intensive computing. Parallel processing framework is also studied with severalkey issues discussed.1. Implementation and optimization of parallel processing frameworkBased on the Robinia platform, we designed and implemented the parallelprocessing framework for wide area network spatial data. Data model design andparallel processing logic design are explained. Performance evaluation, bottleneckanalysis and code optimization are also discussed. We focus more on the datareplication and load balancing. Use some rules to automatically copy data replica andassign those data into different data nodes to make the data distribution suitable for jobrunning. High performance of parallel processing relies on the good data distribution.Experiments confirms that Robinia parallel processing framework achieves goodperformance in scalability, robustness, flexibility and low overhead2. Study of data-intensive computing scheduling algorithmUpon the parallel processing framework we implemented several testing andschedule algorithms can be performed. We studied the scheduling strategies for remotedata fetching, data replica assignment and data importing. A multi-queue schedulingalgorithm is brought up for the scenario in which data nodes remain the same whilecomputing nodes increase. Test case for the experiments is the drought detectionalgorithm (including NDWI) provided by Institute of Remote Sensing and DigitalEarth in Chinese Academy of Sciences. We use the Master-Worker model for parallelprocessing and ran on Linux/windows heterogonous nodes. Results show that schedule algorithm is very important to the performance of distributed system, and data localitycan significantly reduce the processing time cost. Multi queue schedule alrogithm canachieve better performance compared with random schedule algorithm.

Keywords/Search Tags:

Data-intensive computing, Parallel processing framework, Schedulealgorithm, Robinia platform

PDF Full Text Request

Related items

1	Parallel Optimization Of Data Intensive Computing On Sunway TaihuLight
2	A high productivity framework for parallel data intensive computing in Matlab
3	Bayesian Network Parallel Learning And Incremental Maintenance For Data - Intensive Computing
4	Research On Optimization Of Map Reduce For Interactive Analysis On Big Data
5	Energy-aware memory allocation framework for embedded data-intensive signal processing applications
6	Design And Development Of Data-intensive Computing Oriented Ship Emergency Response System
7	Design Of Energy-efficient Reconfigurable System Architectures For Data-intensive Computing
8	Research On Parallel Optimization Technology For Accelerating Data-intensive Algorithms
9	Analysis And Research On MARS Framework Based On GPU Computing
10	Research On Big Data Processing System Based On MapReduce Parallel Processing Framework