Font Size: a A A

Temporal locality at procedure level: Its study and exploitation

Posted on:2005-07-25Degree:Ph.DType:Dissertation
University:Rutgers The State University of New Jersey - New BrunswickCandidate:Batchu, Ravi VenkataFull Text:PDF
GTID:1458390008981443Subject:Computer Science
Abstract/Summary:
All executable binaries on a machine, consisting of the kernel, applications, and dynamically linked libraries, are composed of procedures. While program locality has been studied and exploited at different levels of memory hierarchy, including the cache, block and page levels, there has been little effort to study locality at the level of procedures and to leverage it.; Using fixed size and variable size working sets we establish temporal locality at the level of procedures. We propose and validate models for generating procedure level references.; Based on the observation that at least half of the procedure invocations are to procedures smaller than 128 bytes we propose a class of feedback directed optimization techniques called Procedure Level Relocation (PLR) that monitors the usage of procedures and does a dynamic system wide relocation very infrequently to improve code locality. We present a novel framework for simulating the entire machine in enough detail to bring up the operating system thus taking into account references due to system calls, interrupt service routines, multiprocessing and study procedure level relocations.; For an instance of PLR which copies procedures into Small Procedure Cache (SPC), a small hardware structure with the same latency and bandwidth as the L1 cache, we show that the instruction cache misses can be reduced by 15% and as much as 44% in some cases compared to a baseline machine with the same hardware budget.; We study the SPC instance of PLR along with another instance of PLR that uses Small Procedure Page (SPP), a physical page frame dedicated by the operating system for small procedures, and show that PLR consistently reduces instruction Translation Lookaside Buffer (iTLB) misses for all workloads. We show that PLR with SPP can reduce iTLB misses by 8.71% on an average and consistently outperforms the SPC variant with respect to the iTLB. In particular it is very effective for desktop applications, which suffer more from iTLB misses than the SPEC CPU 2000 benchmarks by at least an order of magnitude, reducing the iTLB misses for them by 21%, on an average.
Keywords/Search Tags:Procedure, Locality, Itlb misses, PLR
Related items