Font Size: a A A

Using Markov Logic Networks to provide autonomic management of high performance computing system configurations and operations

Posted on:2015-12-13Degree:Ph.DType:Dissertation
University:University of Maryland, Baltimore CountyCandidate:Schauer, Randy NicholasFull Text:PDF
GTID:1478390017998170Subject:Computer Science
Abstract/Summary:
A framework utilizing statistical relational learning methods to resolve configuration conflicts and maintain details of system state can remain functionally cohesive, domain flexible, and minimize initial configuration and state awareness in order to provide automated services for managing the complex and changing configuration and operations of High Performance Computing systems. In this dissertation, the issue of how to construct and utilize a framework built upon distributed knowledge to manage complex system configuration and operations is addressed. The focal point of the framework is the statistical relational learning model using Markov Logic Networks (MLNs), which can be applied in different ways to various types of system management problems. This framework provides a new methodology to solving these issues without relying on a centralized knowledge base. Beyond the use of MLNs, this framework contains modules covering host selection, data gathering, inference analysis, and analysis actions.;To understand how the framework can be applied to various system management domains, two different areas have been selected to prove its flexibility and functionality: configuration management and job scheduling. The configuration management domain focuses on a variety of operating system-level parameters to ensure consistency and correctness. While the job scheduling domain focuses on understanding processor core temperatures and profiles across the system to minimize the thermal variance across the system using intelligent scheduling techniques.;The contributions of this dissertation are: (i) Development of a framework to provide consistent system management functionality in a distributed manner using MLNs; (ii) Defining an approach to implementing this framework to solve distributed configuration management issues and analyzing these results to correct identified conflicts; and (iii) Defining a job scheduling approach for HPC systems which has a set goal of thermal balancing while utilizing this framework and analyzing its results to minimize thermal variance across a system.
Keywords/Search Tags:System, Configuration, Framework, Management, Using, Provide
Related items