Font Size: a A A

Design And Implementation Of Multi-threaded Fault Recovery System On Loongson Multi-core Processors

Posted on:2017-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:S M QiaoFull Text:PDF
GTID:2308330509457489Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Big data and Io T are in the era of rapid development. The multi-core and multi-thread technology has play an important role in accelerating this process. It also asks for higher requirements for the stability and robustness of multi-thread program running in multi-core processors. However,every program is possible to meet problems at unknown moment. The transient fault is the most common one which causes application to fail. Therefore, the technique of fault recovery is worth studying.The technique of multi-thread fault recovery on multi-core processors in studied in this paper. Firstly, the existing fault recovery technology are introduced.Secondly, in order to determine the process status information that need to be saved in the technique of fault recovery, the implementation of process and multi-thread in kernel are analyzed. It is concluded that following information need to be saved,including registers, memory, signal, file and so on.An operating system level fault recovery system is designed based on the above research and analysis. We can recover the program when a fault occurs according to the checkpoint file which was saved in normal operation stage of a multi-thread program. The system is transparent to the application layer. Information, such as registers, memory address, current work directory and open files can be written into the memory buffer directly. The memory buffer here mean the memory that is managed by our system after allocated under the operating system level, it will be released when the kernel module is removed. For the data in memory, we need to get the physical address first before copy the data in page frames. In addition, a new method is given in how to decide the time of making checkpoint. Instead of taking time interval, we determine the time by counts the system calls which contain the data transmission. When performing a fault recovery, the share information, such as open files, memory address can be recovered in a thread while the private information need to be recovered in all threads.Finally, the experiment environment and compiling method are described. The basic function of the system is also tested. And we analyzes the time consumption in three aspects: number of threads, data size and time interval. The results show that number of threads and time interval are more influential than data size in changing the performance lost of programs.
Keywords/Search Tags:fault recovery, multi-thread, checkpoint, operating system-level, kernel
PDF Full Text Request
Related items