| Dynamic binary translation(DBT)is the cornerstone of many important applications.In essence,a DBT system dynamically translates executable binary code from a guest instruction set architecture(ISA)to a host ISA,which can be different from or same as the guest ISA.Also,it enables code tracing and instrumentation during the translation process.Hence,DBT is widely applied to many fields such as cross-ISA virtualization,ISA simulation,system software analysis,etc.On the host platform,DBT maintains a software emulated guest CPU to guarantee the equivalence between the guest code and translated host code.With this technique,guest instructions are translated to a set of operations to the emulated CPU in host memory.This efficiently bridges the gaps between guest and host ISAs.However,using memory locations for guest CPU emulation introduces a large amount of memory operations,and thus causes even more issues in DBT systems.First,DBT systems suffer from low execution efficiency,e.g.,up to 36 x slower than the original execution for some tested applications.A direct impact is unsatisfied user experience.And the time consumption of test programs required during the development of DBT systems is also significant.At the meantime,the extra memory write operations also shorten the lifetime of worn-sensitive memory devices,such as the novel non-volatile memory(NVM).To address the above issues,in-depth studies have been conducted to reveal the root causes.With discoveries from the studies,this dissertation proposes corresponding solutions for improving DBT systems,including NVM wear mitigation,SIMD resources based performance optimization,and fast and adaptive performance regression testings.First,DBT systems issue more memory writes to NVM,and thus make the wear even worse.This dissertation conducts a wear characterization study of DBT systems,which reveals that DBT systems have intensive and uneven wear problem for NVM,and most of the memory writes are from the memory-based guest CPU emulation.Unfortunately,existing works cannot mitigate the fine-grained and extremely uneven wear effectively.Followed by this study,this dissertation next proposes NVM wear mitigation strategies for DBT systems,with understanding application memory behaviors.It combines guest CPU reallocation based wear leveling and host register based wear reduction technologies.Therefore,based on the knowledge of memory write behaviors of DBT systems,the approach proposed in this dissertion can distribute memory writes to more NVM cells and keep write-intensive data in host registers.Besides,issues such as register conflicts are addressed and additional memory writes are reduced.Second,this dissertation proposes an effective and unconventional exploitation of SIMD resources for DBT systems,in order to achieve improved performance.Most of the existing SIMD-based optimizations rely on guest SIMD instructions or data level parallelism,however,regular applications barely meet the requirements.The proposed approach does not have such limitations,and thus more regular applications can benefit from the optimization.In particular,by mapping guest registers to host SIMD registers,the proposed exploitation is able to take advantages of ample host SIMD registers and powerful host SIMD instructions to generate more efficient host binary code for guest applications even without any fine-grained data-level parallelism.Experimental results show that the proposed optimization can achieve performance speedups for all tested platforms and applications,up to 2.2x.Third,although performance regression testing is an effective approach to detect potential performance regression issues,it is not easy to apply performance regression testing to DBT systems,because of the extremely long execution time of existing standard test suites.In this dissertation,an approach with several novel techniques is presented to address these challenges.Specifically,it automatically generates adaptable test programs from existing real benchmark programs of DBT systems according to the runtime characteristics of the benchmarks.The test programs can then be used to achieve highly efficient and adaptive performance regression testing of DBT systems.Last but not the least,based on the above approach,further study is conducted to support multi-threaded testings by taking into account thread activities.In particular,it analyses the dynamic distribution patterns of basic blocks across active threads.Next,effeicient test programs are generated with the corresponding patterns preserved.Evaluation demonstrates the generated test programs can produce the same or similar results as the original long-running test programs,with a speedup up to 248 x. |