The Integration Of Data Grid Software Gfarm And Computing Grid Software LSF

Posted on:2006-09-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y N Cao

Full Text:PDF

GTID:2168360182457151

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The technique of grid is one of the most active research fields in the world at present. It can be divided into computing grid and data grid on the whole. Computing grid faces to mission with huge calculation amount. By linking every kind of computers (include cluster), database, devices, storage devices and so on through network, computing grid can form a highly functional computation environment that is comparatively transparent and subjunctive to customers. Data grid faces to data intensive mission and data management. The study is a collaborative item of Prof. Xiaohui Wei and Dr. Wilfred Li. Dr. Li works in UCSD and the latter is a member of "PRAGMA". We want to integrate LSF and Gfarm and then do some works in order to solve the questions such as data aware and job schedule, etc. Exploited by Platform company, LSF multi-cluster system can make lots of isomerous computers commonly share computation sources through LAN or internet. This system can also provide transparent visit to sources for customers. Gfarm is a mid-ware of data grid and was made by AIST. It is an overall situation document system that faces to data intensive grid application program. It can support parallel operation to documents. Gfarm can found copies for native documents and provide the information of copies for tasks. Its document is a logical one with the unified marking symbol of gfarm. In physically, one Gfarm document can be cut into lots of Gfarm index fragments and stored in different service nodes. Every fragment can be located by the unified marking symbol of gfarm [5]. Based on the package board mechanism of LSF, we realize a scheduling module-data aware, that aims at data intensive tasks. The main input of data aware scheduling module are the tasks that are in the waiting status and machine nude sets corresponding to work running in the present system. And it can obtain the distributional information of all of the Gfarm documents. The scheduler of LSF realizes plug-inmechanism. Customers can compile self-scheduling modules, embed in the scheduler and actualize the scheduling strategy meeting their needs. Using this mechanism, customers do not need to consider the unrelated parts to design in the scheduler and only need to care the design and actualization of their own scheduling strategy.The output of data aware scheduling module are a series of scheduling commands, such as executing work, preengaging machines and so on. To the scheduler of LSF, this scheduling module is a inbuilt dynamic coupling library and the information exchange in the module can be done using the appointed frame programming interface for the scheduler of LSF(API). To Gfarm system, as a standard program for the customers, this scheduling module can use not only the programming interface but also the command of the Gfarm system. This scheduling module contains the next three parts: plan module, mission flow sheet and instruction production module. The output of plan module is a group of mission flow sheet; one mission flow sheet correspond to a document and the work executing of the document; based on the mission flow sheet, the instruction production modules produce the concrete commands. The data aware scheduling modules only care the work that visit Gfarm system and the other works can be completed by other modulers. We use the on hand LAN in the lab to do the system and use the DELL-GX270N PC to serve as the DNS, NFS and NIS service machines. The standard Internet agreement(such as TCP/IP andHTTP)are used and DELL-GX260N PC are used as the counter custom site. The DELL-GX270N PC are also used as the main LSF service machine and DELL-GX260N PC used as the counter custom site. DELL-GX270N PC are used as Gfarm meta-data service machine and DELL-GX260N PC used to serve as the counter file system node and clent. The tests before integration enunciate: Under the circumstances that one retrieval mission is handed over once and the runtime of CPU is the same, if the documents distribute on one node, the visit time from foreign land is 2.7 folds to that from native; while if the documents distribute on many nodes, the visit time from foreign land is 1.4 folds to that from native.Under the circumstances that many retrieval missions are handed over once and the runtime of CPU is the same, if the documents distribute on one node, the visit time from foreign land is 2.7 folds to that from native; while if the documents distribute on many nodes, the visit time from foreign land is 1.4 folds to that from native. The system of Gfarm data grid has no work manage mechanism and lacks mission dispatch function. If there are many works focus on visiting one document or many works compete the network bandwidth, the running rate of each work will decrease and the whole function of the system will decrease. After integration, the software of LSF can dispatch and manage the system of Gfarm data grid and do batch processing of the works visiting the Gfarm documents. To the batch processing works, if 5 works are handed over before integration, the total completion time of the mission is 788.0 seconds and 5 CPUs are used; if 5 works are handed over after integration, the total completion time of the mission is 165.0 seconds and only 1 CPU are used. So such integration avoids the question that many works compete the network bandwidth.The running rate of works is higher than that before integration, the using rate of the system is decreased and the whole function is increased. If there are large quantities of works are handed over and the parent-file are locked, the documents visiting method can not satisfy the demand to quickly respond to the works. When works are handed over, it is difficult to dispatch mixture works reasonably. In the study, we try to dispatch and send mixture works. Before integration, the sending of works is placed in no order status. There are many works focus on visiting many documents and many works compete the network bandwidth. The running rate of each work and the whole function of the system are all decreased. After integration,works lock to document nodes when they are sending and lock to service site through the scheduling of LSF. The integration avoids the question that many works compete the network bandwidth. The running rate of works is higher than that before integration, the using rate of the system is decreased and the whole function is increased. The tests of integration show that the new scheduling strategy can shorten the corresponding time of works, decrease the using rate of the system, avoid many works competing the network bandwidth, avoid many works focusing on visiting one parent-file and increase the running rate of the works. During the process, because plug-in are used, it is avoid that the new scheduling strategy affect the on hand scheduling strategy with no Gfarm application program. When Gfarm application program are being dispatched, the on hand LSF scheduling strategy can be used synergically. A working model is brought up that work scheduling and data copy management are separate [22]. Based on peer-to-peer model,work scheduling and data management are linked to realize data aware scheduling [23]. In the study, the work scheduling and data documents management are integrated in one scheduling module. The advantage is that the information about works and data can be understood by the scheduling module promptly and comprehensively. So the system can do schedule decision more reasonably. The data copy distribution, work loading and the source demanding are matched, so the blindness is decreased. In the study, based on LSF's plug-in and LINUX RedHat9.0, we realize the integration of Gfarm1.0.4 and LSF6.0, the schedule of data intensive work by cluster, and test the different edition of Gfarm. The realizing functions of the study are as follows: â‘ Gfarm application program was lined up the and the node of parent-file was searched according to the data demand of the works; â‘¡The technology of plug-in was used,so the scheduling modules prone to realize and enlarge and can work with other scheduling strategy in the system synergically. â‘¢Mixture works can be locked and do mission modules. At present, we are discussing the next subjects: the conditions to found document copies; how to make it fair enough.

Keywords/Search Tags:

Computing Grid, Data Grid, LSF, Gfarm, Data aware scheduling

PDF Full Text Request

Related items

1	The Design And Implementation Of Storage And Data Aware Scheduling Module In Grid Computing
2	Research On Grid MetaScheduling System And Design And Implementation Of CSF4 MetaScheduler
3	Research On Some Key Techniques Of Data Grid
4	Mobile Data Grid Design And Realization
5	Research On Large Scale Distributed DynamicVirtual Environment Simulation Based On Computing Grid
6	Research On Constructing Of University Computing Grid
7	Research Of Grid Techniques And Application Related In Drug Grid
8	Research On The Resource Scheduling Technologies For Scientific Data Grid
9	Data-aware Scheduling And Data Management Based On LSF And Gfarm
10	Research On Grid Scheduling