Font Size: a A A

Cache Aware Optimization For Thread Granulated Parallelism Under CMP

Posted on:2017-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:B LiFull Text:PDF
GTID:2428330488479874Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The advent of the era of multi-core,computing ability of the processor got great improve.But due to the development speed between the CPU and main memory is not balanced," Storage wall" problem is more and more serious,which become the bottleneck of improving system performance.Under the multi-core architecture,the emergence of the multistage memory ease the pressure.In a typical CMP architecture,multiple processors share the Cache,which not only to improve the performance of storage,and reduce the hardware usable area.But under this structure,because the Cache space is small,multiple threads to Shared Cache,which will create competition for Cache space and make the increase of Cache missing number,which reduce performance of the system.In this paper,the main work is as follows:(1)Cache missing number is one of the important indicators to measure performance of visiting store of system in this paper,by collecting each thread's information,which makes sharing thread Cache space demand equal to Cache size and rationally divide the thread group,on this issue,at first,this paper make the model of thread division Abstract to solve problem of subset sum and use the method of quickly solving subset sum to solve problem of subset sum,which applied to TOP that is the optimal partition algorithm and obtained optimized thread group.Finally simulation tool collect the thread data information through the simplescalar and implement partition algorithm,Experimental results show that the proposed optimization algorithm of thread partition(TOP)in total number of Cache miss of the program compared to the greedy greedy thread partition algorithm on average reduced by 17.48%,compared with the Random thread partition algorithm reduced 14.26%on average.(2)On the basis of the thread division group,our article analyze data on the thread to fetch thread information during program execution and establish the implementation model of data allocation,Through an instance analysis and define the problem of multi-core data allocation,which can describe the thread data optimization allocation,Finally,this paper puts forward multi-core the greed data allocation algorithm(M_GDA)and multi-core dynamic programming algorithm(M_DPA)to optimize the data distribution,which Make the thread of execution at the same time that achieve the shortest time to Visit the store and further reduce the program execution time.Finally,in this paper the data distribution optimization emulator which running 10 standard benchmark tests verify advantage orf algorithm,The experimental results show that compared with Random data allocation algorithm(Random),the dynamic programming algorithm of multi-core(M_DPA)which optimize the data distribution make time overhead improve by 16.11%on average and Compared with multi-core greed(M_GDA)data allocation algorithm time costs improved by 14.06%,Compared with random data allocation algorithm in the cost of energy improved by 28.02%,compared with greedy data allocation algorithm was improved by 14.40%.
Keywords/Search Tags:CMP architecture, Subsetsum, Thread partition, Data distribution, Dynamic programming algorithm
PDF Full Text Request
Related items