Font Size: a A A

Applications of data mining to cluster scheduling and failure diagnosis

Posted on:2010-11-08Degree:Ph.DType:Thesis
University:The University of Wisconsin - MadisonCandidate:Shrinivas, LakshmikantFull Text:PDF
GTID:2448390002983344Subject:Computer Science
Abstract/Summary:
The trends of steadily decreasing hardware costs and explosive growth in demand for computational power has led to growing popularity of clusters of commodity computers to meet the computational needs of businesses and research institutions. Popular uses of such clusters include running grid management systems to provide high throughput computations, and running parallel database management systems to address data warehousing needs. Clusters can provide the benefit of large increases in productivity, but are also much more complex to manage than a single computer, which often prevents users from achieving the entire productivity benefit offered by the cluster. The scale of the management problem makes it hard for administrators and users to have a complete idea of what is going on in the cluster, and whether it is being used to its full potential.;We propose data mining as a way of mitigating this problem; by using well known machine learning techniques, it is possible to learn patterns in the behavior of the cluster, which can then be used to improve productivity in a variety of scenarios. Data mining approaches are well suited to this problem because they require little human intervention, and hence can deal with large and complex systems.;In the first part of this thesis, we explore some issues that arise when we try to apply existing implementations of data mining algorithms to diagnose as well as predict job failures in grids. We demonstrate that (a) it is feasible to gather enough data in real-time to train useful classifier algorithms, using only a small fraction of the grids computational resources, (b) it is important to choose the features used for classification with care, and (c) it is useful to have both per-user and system-wide classifiers, as they diagnose different kinds of problems. This application of data mining is proposed as a tool to improve user productivity by assisting them in finding errors in grid job submissions.;The second part of this thesis turns to a different scenario, viz. sharing a cluster between a grid management system and a parallel database management system. Common wisdom holds that parallel database systems should be run in isolation, so as to maximize performance. We investigate whether this is necessary or desirable, and show that a cluster can indeed be beneficially shared, but we need a scheduler that knows the interaction between grid jobs and parallel database queries. We propose a heuristic scheduling approach that uses data mining techniques to predict the interference of jobs and queries, and demonstrate that it allows for increased productivity and flexibility in the cluster.
Keywords/Search Tags:Cluster, Data mining, Productivity
Related items