Font Size: a A A

Runtime support for improving reliability in system software

Posted on:2011-09-28Degree:Ph.DType:Dissertation
University:The Ohio State UniversityCandidate:Gao, QiFull Text:PDF
GTID:1468390011971510Subject:Computer Science
Abstract/Summary:
As software is becoming increasingly complex, software reliability is getting more and more important. In particular, the reliability of system software is critical to the overall reliability of computer systems since system software is designed to provide a platform for application software running on top of. Unfortunately, it is very challenging to ensure the reliability of system software and the defects (bugs) in it can often cause severe impact.;This dissertation proposes to use runtime support for improving system software reliability. Runtime support here refers to the technique to extend the runtime software system with more functionalities useful for reliability-oriented tasks, such as instrumentation-based profiling, runtime analysis, checkpointing/re-execution, scheduling control, memory layout control, etc. Leveraging runtime support, this dissertation proposes novel methods for bug manifestation, bug detection, bug diagnosis, failure recovery and error prevention in multiple phases in the software development and deployment cycle.;The most preferable phase to detect and fix software bugs is pre-release testing phase. To improve the testing effectiveness and efficiency, this dissertation proposes the first method to help manifest the bugs hidden in system software. Facing the real-world fact that there are always some bugs making their way to deployment sites no matter how rigorous the software testing is, this dissertation proposes the second method to help monitor the system software and detect runtime errors. To handle the runtime errors caused by software bugs, this dissertation proposes the third method to help diagnose the failure, recover the program, and prevent future errors due to the same bugs.;Specifically, we propose a software testing method called 2ndStrike to manifest hidden concurrency typestate bugs in multi-threaded system software. 2ndStrike first profiles certain program runtime events related to the typestate and thread synchronization. Based on the logs, 2ndStrike then identifies bug candidates that would cause typestate violation if event order is reversed. Finally, 2ndStrike re-executes the program in multiple iterations with controlled thread interleaving for manifesting bug candidates.;In addition, we propose a deployment-time monitoring and analysis method called DM-Tracker to detect anomalies in distributed system software running on parallel platforms during production runs. Based on the observation that data movements in parallel programs typically follow certain patterns, our idea is to extract data movement (DM)-based invariants at program runtime and check the violations of these invariants. These violations indicate potential bugs such as data races and memory corruption bugs that manifest themselves in data movements. Utilizing the data movement information, we propose a statistical-rule-based approach to detect anomalies for finding bugs.;Finally, we propose a deployment-time fault tolerance method called First-Aid to recover failures in system software due to common memory bugs during production runs and prevent future errors caused by the same bugs. Upon a failure, First-Aid diagnoses the bug type and identifies the memory objects that trigger the bug. To do so, it rolls back the program to previous checkpoints and uses two types of environmental changes that can prevent or expose memory bug manifestation during re-execution. Based on the diagnosis, First-Aid generates and applies runtime patches to avoid the memory bug and prevent its reoccurrence.;We have designed and implemented software prototypes for the proposed methods and evaluated them with real world bugs on large open-source system software packages, such as Apache, MySQL, Mozilla, MVAPICH, etc. The experimental results show that the methods proposed in this dissertation can provide great help in improving reliability of system software in various scenarios. In addition, the results also demonstrate that the runtime support in these methods can bring key advantages such as high efficiency, high accuracy, and high usability.
Keywords/Search Tags:Software, System, Runtime, Reliability, Bugs, Method, Dissertation proposes, Improving
Related items