Font Size: a A A

Research On Application Level Online Deterministic Record And Replay Based On Domestic Multicore Processor

Posted on:2016-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z F JiFull Text:PDF
GTID:2308330479990056Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
This dissertation focuses on deterministic replay technique. Deterministic replay is so called record and replay. It traces running programs from a third party view, logs key events and reproduces execution using these informati on. Researches on reliability of domestic multiprocessors have achieved many productions such as process level redundant error checking and operating system roll back method. These provide solutions to fault tolerance on single thread numeric programs. Mul ti thread and non-numeric programs often produce different result among different executions. They don’t fit into the redundant execution and compare method for error checking. This dissertation aims at eliminating non-determinism by record and replay technique. So that scope of application of error checking method can be expanded. Better recover method can be achieved and this will benefit debug process greatly. System researchers have paid close attention to record and replay technique. It can be applied to debugging, architecture performance simulation and intrusion detection and so on. Nowadays general record and relay tools for single thread application has been widely adopted. But solution for multi thread application is still in need.This dissertation starts from identifying non-determinism by studying the origins and how influence on applications is achieved. Related papers are read to find out existing implementations and the degree the methods can possibly achieve. It follows discussion about advantage of on-line replay and why it suits fault tolerance well. In order to make record and replay transparent to applications and inherit research products, this design starts from an operating system view. First discussion is on what to record and how to replay for system calls classified by whether it change the operating system state referencing Scribe. Then the process of signal generation, delivery and handling is studied. After stating the CREW protocol for dealing with shared memory writing, obstacles in implementing record and replay for signal and shared memory access is analyzed. The getpid system call is taken as an example to implement record and replay on x86 architecture which transplanted to Loongson 3A processor later. Experiments are made to prove correctness and performance overhead is evaluated. The analysis, design and experiments can be a reference for future research.
Keywords/Search Tags:determinacy, record and replay, multicore processor, transient fault tolerance
PDF Full Text Request
Related items