Font Size: a A A

Research On Analysis And Optimization Of Data Access For Memory Hierarchy

Posted on:2010-09-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J WuFull Text:PDF
GTID:1118360305973660Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The speed gap between processors and memories has always been one of the system performance bottlenecks, which is well-known as "Memory Wall" problem. To solve the memory wall problem, memory hierarchies have been used in almost every computer. Therefore, the study of memory hierarchy has been one of the key techniques for improving the performance of computer systems.Memory hierarchy access, or data access, is the "bridge" between processors and memories in the memory wall problem. Thus, the study of the characteristics of data access is the base of solving the memory wall problem. We summarize six kinds of data access characteristics including dependency, reuse, similarity, affinity, coherence/consistency and liveness.●Dependency describes the order of data accesses where there is at least one write operation. It restricts the correctness of programs.●Reuse describes two or more accesses to a data element or a data set which contains data elements in adjacent addresses. It is the precondition of locality in memory hierarchy.●Similarity describes the relationship of the values of corresponding data ele-ments in multiple execution entities of one program. It is used to optimize the occupation of memory resource for multiple execution entities.●Affinity describes the relationship of the accessed frequencies of a data element by multiple processors. It affects the access performance of processors due to data distribution.●Coherence/consistency describes the relationship of multiple copies of one data. It affects the correctness of programs.●Liveness describes the relationship of lifespans of data accesses. It is used to solve some resource allocation problems.These characteristics are not independent of each other, but in reality corre-lated. And they reflect different sides of data access. We divide them into two classes. One class, including dependency and coherence/consistency, mainly affects the correctness of program execution and is often used to guide program transfor-mations and architecture designs; the other class, including reuse, similarity, affinity and liveness, mainly restricts the performance and is the basis for software/hardware performance optimization. Besides, reuse and similarity show a kind of resources compatibility from the address and value sides respectively. Liveness and affinity express a kind of resources exclusiveness from the temporal and spatial sides re-spectively. Therefore, the data access characteristics are divided into three groups according to correctness and performance, resources compatibility and exclusive-ness. The two characteristics in each group are orthogonal in two aspects:temporal dimension and spatial dimension of execution, and address and value of accessed data. Dependency, reuse and liveness all describe the data access from the temporal dimension of program executions; and coherence/consistency, similarity and affin-ity describe the data access from multiple data copies, multiple data entities and multiple data positions respectively. Meanwhile, dependency, reuse, and affinity is defined from data addresses; and coherence/consistency, similarity and liveness is defined according to data values.This thesis mainly studies three of them:reuse, similarity and affinity. The main innovations in this thesis can be summarized as following:1. A parallel data reuse model is proposed. This model analyzes the reuse in parallel programs implemented in OpenMP and OpenTM, classifies the reuse in parallel programs, and gives the measurement of each class of data reuse. It extends the serial data reuse model proposed by Wolf to parallel computing, and has important guiding significance for shared-memory parallel program analysis and compiler optimization.2. Data-Object Oriented Cache (DOOC) is put forward. Under the co-management of software and hardware, DOOC allocates data-objects in programs into differ-ent segments in cache, and chooses suitable strategies for them. The segment strategies can vary in many aspects, including segment capacity, associativity, block size and coherence protocol. The experimental results show that DOOC matches the diversity of reuse in programs better, compared with traditional cache and therefore the efficiency of caches is improved.3. Analysis of similarity is given. This thesis classifies the difference spreading, studies the behavior of each kind of difference spreading, sets up the difference spreading model and solves the similar data set in programs according to the difference spreading model. The analysis of similarity studies the similarity sys-tematically and quantitatively, and is the basis for related compiler optimization techniques.4. Similar page for shared memory architecture is designed. It is an optimization technique co-managed by compilers and operating systems. The experimen-tal results show that the similar page technique can reduce the data amount in shared memory architecture, including shared cache and shared memory, through merging similar data in similar processes into one physical page. So, it can improve the system performance as well as parallel scalability.5. Analysis of affinity is given. This thesis defines vertical affinity degree and hor-izontal affinity degree and studies the measurement method of them. Besides, this thesis explains the key problem in the research of non-uniform cache ar-chitecture according to affinity. The approach proposed in this thesis measures the affinity quantitatively, and has very important guiding significance for many optimization techniques in distributed storage architecture, such as data layout, task partitioning, task scheduling and so on.6. Several optimization techniques of data distribution on dynamic non-uniform cache architecture (NUCA) are put forward. This thesis proposes smart multi-hop promotion, arbitrary stride hardware prepromotion and software prepromo-tion for dynamic NUCA in single-core platform, and bank coherence technique in multi-core platform. The experimental results show that these techniques optimize the data position in dynamic NUCA and improve the system perfor-mance.
Keywords/Search Tags:memory wall, memory hierarchy, data access, cache, memory, reuse, similarity, affinity
PDF Full Text Request
Related items