Exploring fine-grained process interaction in multiprocessor systems

Posted on:1998-05-15

Degree:Ph.D

Type:Thesis

University:University of Minnesota

Candidate:Johnson, Donald Elmer

Full Text:PDF

GTID:2468390014477087

Subject:Computer Science

Abstract/Summary:

Several techniques have been used to improve the performance of process interaction in fine-grained multiprocessor systems. These existing techniques tend to have long memory latencies or synchronization times, or they require complex and expensive hardware. This thesis proposes that user-level hardware and special-purpose communications channels for different interaction domains can dramatically improve access performance with relatively modest hardware cost. The thesis characterizes some specific domains for which the hypothesis holds. New lock and barrier mechanisms are presented that reduce both contention and latency to the minimum values that can be obtained using shared-bus communications, requiring at most two shared-bus transactions, with one transaction being typical. Distributed hardware locking queues and barrier flags reduce the latency for process continuation after obtaining a lock or reaching a barrier to near zero. Four additional interaction mechanisms that use serial communication between processing elements (PEs) in a manner that eliminates inter-PE clocking delays are presented. All of these new techniques increase scalability, are applicable to both new architectures and to existing systems, and are less complex than other hardware solutions. The optimum two-dimensional cluster size for N PEs is shown to be proportional to {dollar}(NI/D)sp{lcub}1/2{rcub}{dollar}, where I and D are the mean inter-node times, including gate and time-of-flight, on the global and local loops, respectively. The access latency when optimally clustered is shown to be proportional to {dollar}(NID)sp{lcub}1/2{rcub}{dollar} Using conservative parameters when optimally clustered, the maximum number of PEs for expected latencies of one microsecond are: 15621 PEs for barriers, 61308 PEs for locks, 37698 for shared-data, and 14592 PEs for shared-registers. All mechanisms are shown to have near-optimum performance if the configuration is near-optimum for any particular mechanism. Hierarchies beyond two levels were shown to have expected latencies proportional to the sum of all loop-times.

Keywords/Search Tags:

Interaction, Process, Shown

Related items

1	Subtitles Shown Of Dad, Where Are We Going Typical Case Analysis Of New Post-production Method In Television Program
2	Modeling and statistical analysis of ultra-wideband (UWB) channels and systems: A point-process approach
3	Research And Implementation Of Runtime Management Technology In Multi Process Interaction System
4	Research And Implementation Of Event Driven Multi Process Collaborative Interaction Platform
5	Research On The Mechanism Of Lightweight Business Process Interaction
6	Development of vision-based inferential sensors for process monitoring and control
7	Synthesis and testing of threshold logic circuits
8	Research Of Human-machine Interaction System Of Grid Dispatch Automatic Station Based On Collaboration Of Multi-process
9	Interaction Design Study About Translation System Of Infant Cryinng
10	Modeling Of Communication Atmosfield In Human-Robot Interaction Based On Fuzzy Analytical Hierarchy Process And Its Application To Human-robot Interaction System