Font Size: a A A

Research On QoS Guarantee And Resource Optimization Key Technologies In Data Grid

Posted on:2012-06-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:M C QuFull Text:PDF
GTID:1118330362950205Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Grid is an infrastructure of advanced information technology. It aims to effectively integrate a variety of widely distributed computing resources, storage resources, communication resources, information resources, and to provide users with a virtual, unified, transparent computing environment. Data grid as a branch of grid computing has been of great concern to academics. Data grid is an integrated architecture that in wide area can effectively manage, analyse and use distributed data sets. Data grid will achieve safe, reliable and efficient data transmission, access, store and copy management operations, and provides a unified interface to different storage systems, so makes data-intensive high performance computing and scientific research be possible.Ian foster pointed that one of its basic features is "to provide exceptional quality of service (QoS)". In order to guarantee the data grid has a higher QoS, the grid system must overcome many uncertainties of network and grid nodes. Currently the technologies of resource (capacity) reservation, replica deployment, buffer strategy, parallel transmission and data storage and recovery are the primary means of solving such problems and the hot research issues. Mass data storage and transmission result in the unnecessary waste of network transmission capacity, storage and node resources, and result in the dramatic reduction of acceptance rate for grid service in peak hour and the decline in the overall QoS. At present, most researches focus more on enhancing the quality of services from some aspects, but less consider the optimal scheduling of resources, thus while the QoS is ensured, much price will be paid at the same time.The thesis based on the purpose of "the guarantee of QoS is a basis, and the optimization of resource is goal" deeply studies the problem that how to guarantee the QoS while effectively use the resource of grid. In this thesis we divide the basic functions (data storage and data transfer) of data grid into five sub service (transport, storage, cache, node selection, resource reservation) from service level of QoS, for different sub-services specific strategies are used and dual objective of QoS guarantee and resource optimization are achieved at the same time. Specifically:(1) Multi-replica deployment can be used to improve the reliability of data and service bandwith, decrease the workload of network. The algorithms based on multi-replica can be used to increase the transmission speed further, can guarantee the QoS of data service. But multi-replica causes a waste of storage space and network transmission capacity. In this thesis a distributed storage model is proposed first, the model has a large advantage of the use of storage space (memory optimization), and also has the characterristic of "P integrity", i.e., when P nodes fail, the complete data can be got from the remaining nodes. A parallel transmission scheduler is put forward based on storage model, When double redundance is used, the scheduler can adapt to big differences of transmission speed of replica nodes. Based on storage model and scheduler a parallel transmission algorithm is proposed, When reasonable parameters are configured, the algorithm can achieve the transmisison performance that the algorithms based on full-replica can get.(2) In order to guarantee the realibility of data storage, the dynamic data recovery based on parallel transmission is the basic capacity that the data grid should have. In the premise of optimizing the use of storage space, not only the basic QoS of data storage (reliability and usability) should be guaranteed, but also the availability (ease of use) must be considered. A dynamic data recovery model (DDRM) is proposed based on storage model, scheduler, parallel transmission algorithm, node failure, dynamic recovery process and data exchange center strategy. DDRM has lower data failure probability comparing with double-replica and greater availability comparing with erasure codes stragegy.(3) Data buffer is a key strategy that can be used to overcome the instability of the network. Taking into account the characteristics of mass data and limited resources, the size of buffer should be optimized in cache service, meanwhile the following factors should be considerd: the failure probability of service nodes, set of service nodes, transmission speed of service node, constraint of failure time of task and the whole service failure probability. By introducing limited buffer model, from the perspective of the data consumer, a service failure model is proposed based on parallel transmission mode. The model effectively represents the quantitative relationship of various parameters that can impact the service failure. In simulative experiment, the theoretical values of model and experimental value are compared, and good results are got, menwhile the objectives of buffer optimization, service node optimizaiton, and the guarantee of QoS for caching services, are chieved.(4) An important problem for parallel transmission based on multi-replica is: how to select replica node under the condition of meeting the QoS constraints of service reliability, transmission time etc. Two node selection models are proposed, the models consider the parameters of transmission speed of node, reliability, transmission distance, network status, bandwidth and other factors, taking those parameters as input, the model can be used to output the optimal service node set. So the multiple optimization objectives of rational use of node resources, reducing the network load, reducing the tolerance of service request, enhancing the acceptance rate in peak time and ensuring the minimum cost of one service, are achieved at the same time.(5) Resource reservation is the basic premise of ensuring the successful completion of grid task, the acceptance rate of reservation requests has direct affect on QoS. Resource capacity effectively represents the amount and status of resource from a macro point of view, provides a strong support for reservation services of resource. Reasonable allocation of resource capacity can reduce resource capacity debris, and enhance the throughput and acceptance rate in peak time, and achieve the dual goals of resource optimization and QoS guarantee. Two resource capacity reservation strategies--parallel speedup and four-tuple, are put forward, comparing with previous researches, the strategies proposed can make the grid system make active decision according the comprehensive information of reservation requests, and can make certain resource capacity transformation to reservation requests, so the use of resource capacity can be optimized further.In this thesis, the most basic functions (data storage and data transmission) of data grid are divided into five detailed services, i.e., transmission, storage, buffer, node selection and resource capacity reservation. For different services specific strategy is adopted to achieve the objectives of QoS guarantee and resource optimization.
Keywords/Search Tags:Data grid, QoS, Resource optimization, Parallel transmission
PDF Full Text Request
Related items