Font Size: a A A

Key System Software Techniques For Large Scale High Productivity Computing

Posted on:2013-02-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y P LiuFull Text:PDF
GTID:1268330422473867Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid increase in the demand on computing capability, the scale ofhigh-end computing systems expands and their compute density grows in a tremendousspeed. The development of large scale computing systems is now confronted with hugechallenges in the power supply, management complexity, reliability and cost. Highperformance in terms of raw peak computing speed alone is no longer a good indicationof the capability of a high-end computing system. Instead, high productivity is nowwidely recognized as more desirable, which requires a balance between highperformance and system’s robustness, easiness of use and cost.System software plays an important role in the realization of high productivitycomputing. Focusing on system software, the dissertation investigates the keytechniques in power management and user environment for large scale computingsystems. The main contributions are as follows.1. A power capping technique for large scale sytems: The dissertation proposes anew model of power capping called PCNC and corresponding power managementalgorithms. In the PCNC model, the nodes in a system are placed into four sets: total set,privileged set, candidate set and target set. The system power consumption is splittedinto three states according to two thresholds: the safe, warning and critical states.Different power management mechanisms are enforced on different target nodesaccording to different system power states. Two types of policies are designed andimplemented to select the target set of nodes for power regulation. One is state-based,which chooses nodes running the most power consuming job for power regulation. Theother is change-based, which chooses those nodes that run a job whose powerconsumption increases most rapidly among all jobs. In our experiments, the control costis reduced by76.3%with the lost of the control effect by7.4%, and the two types ofpolicies reduce the overspending ratio of system power by73%and66%with the lostof performance by1.4%and1.1%, respectively.2. Self-adaptive management of sleep depths of idle nodes (ASDMIN): Active idlenodes cause huge energy waste in large scale systems. Putting idle nodes in a sleep statecan save energy, but needs time to wake them up when they are needed. The dissertationproposes a self-adaptive approach to the management of the sleep depths of idle nodesto balance the system’s energy consumption and response times. In this model, idlenodes are classified into different groups according to their sleep states. Each groupcontains nodes of same level of sleep depth and forms a reserve pool of a certainreadiness level. In a resource allocation process, nodes in the pool of highest level ofreadiness are preferentially provided to the application. When the nodes in the pool of the highest readiness level are not sufficient, the nodes in the pool(s) of next level(s) ofreadiness are allocated. After each allocation and relocation of nodes, the numbers ofnodes in each level of pools are adjusted by changing the sleep depth of the nodes upand down. Thus, the reserve pools can be maintained at all times. A key factor thataffects the effectiveness of the idle node management is the sizes of the reserve pools.This paper proposes and investigates a self-adaptive approach to this problem so that thesizes of reserve pools are dynamically adjusted according to the applications. Ourexperiments demonstrated that, by applying our self-adaptive management, the wastedpower of idle nodes can be reduced by84.12%, the power efficiency is improved by82.71%, with the cost of relative slowdown by only8.85%.3. Virtual computing environment for large scale systems: The traditional userenvironment for large scale systems is weak at data security, system usage andmanagement. The dissertation proposes a virtual computation environment called HighPerformance Virtual Zone (HPVZ) to provide supercomputer users with virtual privatecomputing environments. In front-end server nodes, HPVZ employs operating systemvirtualization techniques to provide isolated user environments. In back-end computenodes, high performance computing zones are dynamically created through environmentvariable extraction and file path conversion. The system files of a user environment aredeployed to the local disks and its user files are deployed to the global file system. Filesystem isolation and consistency are implemented via shadow system files andnail-links. Our virtual environment complies with LSB and POSIX. The performancelost in server nodes and compute nodes is less than3%and0.5%, respectively. Theexperiment results show that HPVZ is efficient and practically usable for highproductivity computing.4. Multi-granularity self-adaptive quality of service (QoS) assurance mechanism:To assure the quality of service requires allocating sufficient resources to user’s tasks.However, accurate specification of resource requirements is difficult. Over-subscriptionof recourses will cause other users’ tasks to fail, thus lowering the quality of service atthe whole system’s level. This paper proposes a self-adaptive approach to this problemby dynamically allocating sufficient resource to users while avoiding denial of servicesdue to over allocating resources to the other users. Resources are controlled at multiplelevels of granularity, including process, process group, job and virtual user environment.The resource usage threshold is adjusted dynamically according to the availability ofsystem’s free resources and the user’s requirements. Different policies are proposed toselect the target runtime entity to be terminated when the system resources cannotsatisfy a request. Employing this mechanism, the system productivity is improved by17.14%and the performance loss is less than0.65%in experiments.5. Virtualization of power management: Traditional power management techniques do not work well in virtual machine because hardwares should not be directly controlledby the software running in a virtual machine. How to manage power in virtualenvironment is a grave challenge to virtual machine techniques. In this dissertation, avirtual power management mechanism is proposed, which consists of (a) virtual devicebehavior monitoring and power profiling techniques;(b) virtual power controlmechanisms (i.e. speed scaling of virtual devices and sleeping of virtual devices), and (c)mechanism for the sleeping of virtual machines. These facilities are deployed to twolevels of system software, i.e. in the system software at virtual manchine level andsystem software at physical machine level. A transparent power management interfacecompatible to the hardware power management interface is provided in the virtualmachine so that existing power management software can be run on virtual machineswithout any change. The experiments show that the virtualized power managementmechanisms are compatible with the traditional physical mechanisms, the energyefficiency is optimized by2.75%with the power management deployed in virtualmachine and the performance cost of power management virtualization is below0.4%.
Keywords/Search Tags:High Productivity Computing, Power Management, UserEnvironment, Virtualization
PDF Full Text Request
Related items