Temporal locality at procedure level: Its study and exploitation

Posted on:2005-07-25

Degree:Ph.D

Type:Dissertation

University:Rutgers The State University of New Jersey - New Brunswick

Candidate:Batchu, Ravi Venkata

Full Text:PDF

GTID:1458390008981443

Subject:Computer Science

Abstract/Summary:

All executable binaries on a machine, consisting of the kernel, applications, and dynamically linked libraries, are composed of procedures. While program locality has been studied and exploited at different levels of memory hierarchy, including the cache, block and page levels, there has been little effort to study locality at the level of procedures and to leverage it.; Using fixed size and variable size working sets we establish temporal locality at the level of procedures. We propose and validate models for generating procedure level references.; Based on the observation that at least half of the procedure invocations are to procedures smaller than 128 bytes we propose a class of feedback directed optimization techniques called Procedure Level Relocation (PLR) that monitors the usage of procedures and does a dynamic system wide relocation very infrequently to improve code locality. We present a novel framework for simulating the entire machine in enough detail to bring up the operating system thus taking into account references due to system calls, interrupt service routines, multiprocessing and study procedure level relocations.; For an instance of PLR which copies procedures into Small Procedure Cache (SPC), a small hardware structure with the same latency and bandwidth as the L1 cache, we show that the instruction cache misses can be reduced by 15% and as much as 44% in some cases compared to a baseline machine with the same hardware budget.; We study the SPC instance of PLR along with another instance of PLR that uses Small Procedure Page (SPP), a physical page frame dedicated by the operating system for small procedures, and show that PLR consistently reduces instruction Translation Lookaside Buffer (iTLB) misses for all workloads. We show that PLR with SPP can reduce iTLB misses by 8.71% on an average and consistently outperforms the SPC variant with respect to the iTLB. In particular it is very effective for desktop applications, which suffer more from iTLB misses than the SPEC CPU 2000 benchmarks by at least an order of magnitude, reducing the iTLB misses for them by 21%, on an average.

Keywords/Search Tags:

Procedure, Locality, Itlb misses, PLR

Related items

1	Compiler optimizations for avoiding cache conflict misses
2	Computation of cache misses in matrix multiplication
3	Research On Locality Weighting One-Class Support Vector Machines
4	The Design And Realization Of Instruction Memory Management Unit In RISC Microprocessor
5	Study On Procedure Neural Networks' Model And Learning Algorithms
6	A Design And Visual Implement For The Operation Sementics Of The Procedure Languages
7	Study On Bottleneck-based Decomposition Procedure For Large-scale Production Scheduling Problems
8	Welding Procedure Qualification System For Shipbuilding Based On Client/Server Structure
9	Research On Routing Technologies For Mobile Ad Hoc Networks
10	A higher order theory of locality and its application in multicore cache management