Font Size: a A A

Research On Grid Resource Prediction Using Data Mining

Posted on:2010-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2178360272996599Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Grid is a new computing and applying technology build on Internet, its essence is making the best use of the existing hardware and software resources all around the web, supporting wide area sharing and cooperating with computation, data, storage, information and knowledge resources, eliminating information isolation, and improving the quality of services. The purpose of grid is to connecting geographically dispersed, and heterogeneous computing resources through high-speed network to solving large scale application problems by working together, sharing the wide area resource published information, providing unique programming and application interface, and shield the hardware boundaries, eventually aggregate all the resources on the Internet to form a super virtual computer.Grid is a dynamic system, the resources on grid are always changing. In order to make the best use of these resources, we need to predict the future status of these resources. The prediction of future status of resources is very important in grid computing field, grid resource prediction became a hot topic in many institutions all over the world.Related work refers that the contention that results from sharing resources causes the deliverable performance to vary over time. To make the best use of the resources that are at hand, an application scheduler must make a prediction of what performance will be available from each. This is the original reason for predicting grid resource performance. But the presupposition of this argumentation is to suppose job scheduling based on only load and performance. This paper find out some other important factors besides load and performance which have a lot to with evaluating a cluster's availability, like the influence of local strategies and the requests towards resources for different applications.Resources in the grid are heterogeneous, the performance vary dynamically, and they are independently managed. In this situation, it's very difficult to predict the availability of grid resources. 1. Heterogeneous. From the hardware side, resources differ from their system architecture and computing ability. From the software side, they differ from operating system, local management and scheduler. 2. Performance vary dynamically. The grid environment is not a static system, there are all kinds of unpredictable factors, the resources could be unavailable because of machine or network goes down, and it's very possible that new resources would join in the grid. 3. Independently managed. The grid resources have or under their own local managers, they have their own local scheduling strategies. The grid management system must follow the local strategies, not try to change of take the place of it. This paper is based on"Research on Resource Co-allocation and Meta-scheduling algorithm for Cross-domain Parallel Application", the project of national natural science foundation, providing the meta-scheduling program with management support mainly by predicting the availability of grid resources. From the researches on related materials, most related work from both inside and outside the country focused on the prediction of system load and performance. After the researches and analysis on grid resource and scheduling system, this paper found that in heterogeneous environment, system load and performance couldn't evaluate the whole availability of a cluster. Therefore, based on former work, this paper suggested to classify resources into clusters and take local strategies into consideration, and represented a new grid resource availability predicting system build on GDIA(A Scalable Grid Infrastructure For Data Intensive Application). The system consists of three main modules, the information collecting module, the resources clustering module and the resources availability predicting module, these modules all formed their own architectures. The system using MDS(Monitoring And Discovery System) to implement the information publishing and results uploading functions. Three modules and the MDS module consist the grid resource predicting system.This paper design and implement grid resource clustering, grid resource availability prediction as well as MDS information publication and result reporting. The analysis and prediction process brought in Data Mining technology, classified grid resources into clusters, and found out the local policies and the potential changing rules of grid resources. Using these rules could predict the future resource availability for certain kind of jobs.The grid resource predicting system represented by the paper brought in Data Mining technology. After researches on the conceptions and related algorithms of Data Mining, the paper found using Data Mining could properly solve the problems with related work. The system used clustering method to implement the resources clustering module, and used regression method to implement the resource availability predicting module.This paper studied related work in grid resource prediction, found 2 restrictions in real applications, 1, ignore the demands on resources from jobs for their execution; 2, ignore local policies. This paper studied lots of papers on Data Mining, did many researches on algorithms. This paper used cluster and regression algorithm in Data Mining, carried out and implement the grid resource clustering and availability predicting, solved the problems of related work. Besides, the paper used MDS to implement information publishing and result reporting. Information collecting module, resource clustering module and resource predicting module are relatively independent, including MDS module , they work as the whole grid resource prediction system.The system represented by this paper can be adaptable for large-scale grid environment, compared with related work, the innovation of this paper includes:Cluster the grid resources, could help the meta-scheduler filter the right clusters when a job came in. Set an index system, to estimate the availability of grid resource as a standard.Predict the availability of grid resources, to provide the meta-scheduler with management support.The grid resource availability predicting system represented by this paper could help meta-scheduler discover and organize the matching groups of resources for certain applications including to the resource information. Compared with related work, this system could reduce the time costs, rise up the success rate of applications, and timely predict the intensive change of resource availability.In the future work, this grid resource availability predicting system needs to run on much larger grid environment for experimentation, to test the stability and reliability of the system. The clustering and regression algorithms used in this system could be tested and changed depends on the application needs. And the CPU availability evaluation equation could be improved in the future work.
Keywords/Search Tags:grid computing, local scheduling strategy, resource clustering, resource availability prediction, Data Mining
PDF Full Text Request
Related items