Font Size: a A A

Research On The Key Techniques Of Parallelization And Optimization For Multi-Core Architecture

Posted on:2015-10-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:1108330509961074Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Currently, multi-core processors have been widely used in various areas ranging from supercomputers to PC computation, and the computing technology is now in the era of multi-core and many-core. This thesis surveys the evolvement of the high performance microprocessor: the influence of manufacturing process, the effect of architecture, power consumption factor, the changing applications; describes the challenges and opportunities of many-core processors: the problems of multi-core parallel architectures, memory access bottleneck problems, power consumption, on-chip interconnection issues, multi-core parallel programming problem. This thesis focuses on the parallel technologies of the multi-core architecture from the perspective of collaborating hardware and software design, including multi-core oriented speculatively thread-level parallel and optimized model, TLS(thread-level speculation) cache consistence protocol and storage architecture, the on-chip network architecture, thread level and data-level parallel SDTA architecture. The achievements and contributions of this thesis are as follow.1. I propose a multi-core oriented light TLS parallelism and optimization model. Based on the design of data lists, this model supports TLS, including parallel programming mode, design of TLS parallelism principle, speculative parallelism tuning techniques, software- and hardware-implemented data lists, et al. The designed parallel programming mode is conciser and more convenient for compiling than previous TLS systems. The designed TLS principle guarantees the execution correctness of speculative threads, and the data lists own fine independency. I propose two hardware optimized schemes of length-fixed register vector lists and ordered lists based on high-speed buffer, which are implemented easily and efficiently. According to experiment results, this light model for few cores has similar fine accelerating results, and has the nature of independency and flexibility.2. I propose a TLS optimized cache coherence protocol and storage architecture for multi-core. This protocol resolves the bottleneck problem of centered check mechanism, provides a distributed and cooperated method for different cores’ speculative threads, enhanced the theoretical and verified foundation for the TLS application on more cores. I have proved the completeness of the protocol, done the work of functional verification and performance experiments. The experiments shows that when used on few cores this protocol obtains similar performance compared to the centered check mechanism, while has obviously extensible on more cores, and reduces the re-execution times sharply. Furthermore, I proposed a TLS supported storage architecture, in which distributed-shared buffers are proposed to resolve the thread switch problem.3. I present a communication performance analysis model for many-core networks on-chip, and then propose a thread level communication-oriented configurable NoC architecture. From a macro view a communication performance analysis model is proposed among multicore thread packets. We analyze the high-priority multicast packet network traffic characteristics and the impact on other types of on-chip packet. Mainly for multicast packets flow between adjacent branch node communication process, the use of random network calculus theory gives a way data communication performance analysis model between threads for multi-core-- routing nodes’ backlog and end-to-end delay bound. On the other hand, I firstly propose a software-defined NoC architecture. This architecture separates control layer and data forwarding layer in NoC. Users can program flexibly on this NoC, configure the NoC according to threads’ communication requirements, and improve the NoC performance.4. I propose an efficient thread level and data level parallelism synchronous data trigger architecture. Based on TTA, this architecture combined the thought of synchronous data trigger, TLS execution principle, TLS Cache coherence and storage. The cores adopt programmable NoC to communicate, while exploiting DLP according to the proposed compiling strategies. The computing kernel can be optimized due to the applications. Finally, the experiment is carried out with H.264’s series API. The results show fine TLP and DLP, which proves this architecture’s high performance on data-intensive computing.Nowadays, the many-core architecture gradually dominates every areas of computing. However, the key parallel techniques limit the feasibility and future for the multi-core. In this thesis, we do some research on a series of parallel and optimized techniques by the combination software with hardware. After design and verification, all the techniques are proved their effectiveness on the respective scenarios. The research of these key parallel and optimized techniques provides a beneficial exploration for the many-core’s design and application in the future.
Keywords/Search Tags:Multi-Core Architecture, Parallel and Optimized Technologies, Thread Level Speculative Parallelism, NoC
PDF Full Text Request
Related items