Font Size: a A A

Research On Fault Propagation Analyzing In Software Systems For Critical Reliability Factors

Posted on:2018-06-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:L X XueFull Text:PDF
GTID:1368330566998860Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Since the 1990 s,with the rapid development of information technology,the dependence on computers in all walks of life grows everyday.From Shenzhou spacecraft and satellites to intelligence appliances and smart phones,computers are everywhere.We conclude that our daily work,life and study can not be separated from computers.Because of this,computer malfunctions will bring big troubles to our daily work and life,and may even threaten economic security and social stability of the state,leading to immeasurable loss.The reliability of computer systems has increasingly become to be a particular concern while how to ensure computer systems to keep long-term and stable running has been a problem and challenge that researchers and engineers have to work out.Fault propagation is one of the most important factors that are able to influence reliability/availability of computer systems,and is also the cause the fact that a local fault leads to a system failure.This dissertation focuses on software and conducts research from bottom to top step by step.The propagation of soft errors in programs,fault propagation between local modular software and system calls,fault propagation in networked software and fault propagation in large-scale software systems are studied respectively and critical factors of system reliability are set as the targets,which are able to support for building fault-tolerant mechanisms and optimizing system reliability.As the comlexity of programs increases,using fault injection campaigns to study the propagation of soft errors will cost a great deal of time.To attack this problem,this dissertation proposes a modeling method for analyzing the propagation of soft errors based on dynamic instructions.Architecturally correct execution(ACE)is first applied to exclude the soft errors which are not able to affect the final output of the program.Then,data dependency relationships are employed to represent the propagation of soft errors within the program and a model used for analyzing the fault propagation is created based on the dynamic dependency graph.A criterion for crashes is built as well.An algorithm is proposed to reason about the propagation of soft errors which can affect the program output,and to judge the outcomes of the program under the soft errors.Critical soft errors which lead to crashes are particularly identified by the algorithm and the corresponding crash latencies and propgation paths are also predicted.The mainstream common software is usually developed and executed based on system calls.In course of system calls,parameters and data are transmitted,possibly leading to fault propagation between applications and operating systems.This dissertation takes into account both applications and system calls and focuses on fault propagation between them.In terms of different privilege levels of modern processors,multiple failure modes are defined.Transitions between states are employed to indicate fault propagation.Based on the classic architecture-based reliability model,a new reliability model is created which takes into account system calls,fault propagation and transformation of failure modes.By performing sensitivity analysis,critical parameters of system reliability can be identified.In addition,problems of optimal operating system selection and strategies for software module optimization are able to be solved by linear programming,supporting for optimizing the software system.With the continuous development of the Internet,there is an avoidable tendency that software is ported to networks.This dissertation proposes the mode failures which are more appropriate to the systems according to characteristics of the networked software.Based on them,behavior and fault propagation in component execution and data transmission on the Internet are analyzed respectively,and then reliability specifications of networked components and Internet connections are defined.Transitions between states are employed to model fault propagation and a reliability model for networked software systems is built using discrete-time markov chain.By performing sensitivity analysis,critical parameters of system reliability are identified and a method for evaluating the deployment locations of a certain component is proposed.As the scale of software increases,there is an urgent need to discover an efficient way to explore fault propagation in highly complex systems.This dissertation analyzes component invocation relationships in large-scale software systems and find out the characteristics of components which have most negative impact on system reliability from the views of fault propataion and component failures.Inspired by the principle of Trust Rank,an algorithm is proposed to rank components according to the impact of fault propagation.Illuminated by Page Rank,another algorithm is introduced to rank components according to the impact of component failures.Based on the results of the two ranking algorithms,critica l components which have greatest impact on the global reliability are determined.
Keywords/Search Tags:fault propagation, reliability, soft error, system call, networked software, component ranking
PDF Full Text Request
Related items