Font Size: a A A

Research On Performance And Energy Of Multi-threaded Programs On NUMA Architectures

Posted on:2017-06-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ZhuFull Text:PDF
GTID:1368330566950480Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The Non-Uniform Memory Access(NUMA)architectures are growing more and more popular in cloud computing environment.When single-threaded multi-programmed programs run on NUMA systems,there may occur two problems which are data locality and shared resource contention.When multi-threaded programs run on NUMA systems,these two problems will become more complicated.Firstly,the shared data of multi-threaded programs will lead to remote memory access between different NUMA nodes and thus make the data locality of NUMA systems become bad;Secondly,when multiple threads read and write the memory of a single NUMA node simultaneously,they will compete the shared resource of this node.This will lead to severe cache and inter-connect contention on this node.Moreover,multi-threaded programs may also lead to critical-thread problems.Those threads which have more numbers of remote memory accesses will become critical threads of the NUMA system,which will influence the overall program performance and increase the energy cost during the program execution phases.Previous researches mainly focus on improving the performance of NUMA systems.Rarely has research work considered about reducing the energy cost of NUMA systems.Around the problems mentioned above,we carry out the project of “Research on Performance and Energy of Multi-threaded Programs on NUMA Architectures”,which consists of the following four aspects:Considering the phenomenon of high memory access latency which occurred on NUMA systems,we provide programmers a detection and analysis tool which is used to find out performance bottlenecks of NUMA systems.By analyzing the latency information of the system,this tool can make three judgements: 1)If the latency of shared data access is higher than the latency of private data access,the shared data of the system will lead to a considerable amount of remote access;2)If there exists extremely high data access latency,the shared resource contention may has occured on NUMA system;3)If the numbers of remote accesses are quite different among all threads,the thread which has the largest numbers of remote memory accesses will become the critical thread which influences the overall program performance.When these performance problems are detected and analyzed,the performance of multi-threaded program can be improved by adopting general and simple NUMA optimization method.Considering the phenomenon that critical threads lead to the performance degradation of NUMA systems,we provide a symmetrical scheduling mechanism which is used to balance the numbers of remote memory accesses among multiple threads.Under NUMA circumstances,the program data may be local memory data for some threads,while it may be remote memory data for other threads.Threads which have more remote memory accesses will run slower than other threads.They will become critical threads which influence the overall performance of the multi-threaded program.By using symmetrical thread scheduling mechanism,threads will be mapped on all processor nodes symmetrically.This can balance the numbers of remote memory accesses among all threads and make each thread has similar numbers of local and remote data accesses.Therefore,all threads will arrive at the synchronization points at almost the same time.This avoids the situation that critical threads drag down the performance.Considering the relationship between performance and energy when multi-threaded programs run on NUMA system,we provide a linear negative correlation model about the relationship between the performance improvement and the increased energy consumption.The model considers about the following two hypotheses separately: real speedup tends to be constant,and real speedup tends to be linear speedup.Based on the relationship between performance and energy,it can guide the dynamic optimization of NUMA systems: According to the performance scalability of parallel programs(good or bad),we can increase/decrease the nodes used by this program.Then we can improve performance or save energy on the premise that the energy budget or performance requirement is satisfied.Further research reveals that the factors which influence the relationship between performance and energy include: the overhead of remote memory access,the overhead of thread synchronization,the overhead of load imbalance.Among these factors,the synchronization overhead which is caused by inconsistent finish time between critical threads and non-critical threads is one of the main reasons which lead to bad performance/energy salability.Considering the different finish time between critical threads and non-critical threads on NUMA systems,we provide a dynamic voltage and frequency scaling strategy to reduce the energy consumption of the overall system.On NUMA systems,critical threads have to access more numbers of remote memory accesses and have longer execution time than non-critical threads.Therefore,the program execution time is determined by critical threads.On the premise that the total program execution time is not prolonged,we slow down the frequencies of CPU cores on which non-critical threads are allocated.In this way,critical threads and non-critical threads can arrive at the synchronization points of multi-threaded program simultaneously.Furthermore,the energy consumption of non-critical threads can be reduced.In order to make the dynamic voltage and frequency scaling strategy more effectively,we provide a prediction mechanism of critical threads,so as to save more amount of energy consumption.
Keywords/Search Tags:NUMA Systems, Multi-threaded Programs, Memory Access Latency, Thread Scheduling, Critical Threads, Scalability, Dynamic Voltage and Frequency Scaling
PDF Full Text Request
Related items