Font Size: a A A

Speedup Sequential Program Performance On Chip Multi-core Processor

Posted on:2012-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:2178330335972973Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Experienced ten years of development, Multicore processors have become a mainstream in processor market. However, the traditional sequential programs can not benefit from the multi-core processors because of structural differences. In this thesis, we mainly study the proper methods to promote sequential program performance on chip multi-processor. To solve this problem, we proposed our creative ideas in two separate directions.Chip Multicore processor provides new opportunity to fast sequential program performance with the available duplicated hardware resources in the cores. Most of existed sequential programs can benefit from a larger instruction window and a bigger L2 cache. In this thesis, we propose a simple mechanism, Silent Sharing, to faster sequential program execution on a chip multicore processor. The basic idea is to send long latency instructions in the instruction window to the windows in other cores, as well as evicted blocks from local L2 to the free blocks in remote L2s in order to get a relative bigger instruction window and L2 storage. All the transfer operations are not viewable by the running program. In other words, a running core can silently share other available hardware resources in the other cores on the same chip. The hardware budget of our method is small, and the implementation is trivial. The initial analysis tells us that it is a promising way to improve sequential program performance in a chip multicore processor.At the same time an Adaptive Subset Based Replacement Policy (ASRP) is proposed in this thesis. In ASRP policy, each set in Last-Level Cache (LLC) is divided into multiple subsets, and one subset is active and others are inactive at a given time. The victim block for a miss is only chosen from the active subset using LRU policy. A counter in each set records the cache misses in the set during a period of time. When the value of the counter is greater than a threshold, the active subset is changed to the neighbor one. To adapt the program behavior changing, set dueling is used to choose a threshold from different thresholds according to which one causes the least number of misses in the sampling sets. Using the given framework for this competition our ASRP policy gets a geometric average of improvement over LRU by 5.5% for 28 SPEC CPU 2006 programs and some programs gain improvements up to 50%. In the multicore experiments, the average improvement of throughput is 6% and the weighed speedup is 6.8%.
Keywords/Search Tags:Sequential Program Execution, Chip Multicore Processor, Instruction Window, Cache Replacement Policy, Subset, Set-duel
PDF Full Text Request
Related items