Font Size: a A A

Compiling Design And Optimization For BWDSP

Posted on:2016-01-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Q WangFull Text:PDF
GTID:1108330488493386Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the enhancement of chip design capacity in our nation, a number of mature processor came out, such as Godson designed by Institute of Computing Technology, SW processor designed by Jiangnan Institute of computing technology, YHFT designed by National University of Defense Technology, PKUnity designed by Peking University, C*Core designed by C*Core Technology Company(Suzhou), C-Sky core designed by C-Sky Microsystem Company (Hangzhou), BWDSP designed by East China Research Institute of Electronic Engineering. In order to advance the market, building a sound "hardware platform-infrastructure software-application" ecosystem around the autonomous chips is necessary and become an urgent task. The compiling system occupies a crucial position in the ecosystem. The dissertation topic is about to build an optimized compiler based on the self-designed BWDSP processor. The main research findings are summarized as follows.The dissertation makes research into Open64,GCC and LLVM, which are the main open source compiling infrastructures and discusses their differences from redirection mechanism, optimization techniques aspects.The dissertation also demonstrates the selection criteria for development of compiler based on the open source compiling infrastructure. And then the dissertation illustrates the key techniques of transferring the open infrastructure to BWDSP hardware platform.BWDSP adopts block memory architecture and multiple address generation units.Therefore BWDSP compiler constructs conflict graph on memory access for program variables, and does block allocation to optimize data distribution on blocked memories. Based on this, optimal allocation is finished for program memory access on address generation units, so that codes generated by compiler can maximize the data parallelism implied in programs. And then, considering block memory architecture, parameter passing rules, SIMD instructions and inherent characteristics of instructions, clustering optimization algorithm oriented data parallel is designed. This algorithm effectively solves the parallelism development for processor with high-data-parallel and clustering structure.The key architecture feature of BWDSP is vectorization sysmetic constructure, including computing vectorization and memory vectorization. The vectorization optimization framework is proposed, which combines loop vectorization method based on dependence analysis and basic block vectorization method "superword level parallelism".The dissertation is studying an implementation method of modulo scheduling framework based on the BWDSP’s cluster structure, systematic structure supporting SMD and VLIW. The zero-cost loop and transfer method is introduced; the description method on the machine resource of BWDSP’s cluster structure and systematic structure supporting SIMD and VLIW is presented;The relationship between module scheduling and loop unrolling is discussed; Modulo variable expansion algorithm framework is proposed; Code generation style based on speculation execution is also described.An efficient compiler implementation framework is proposed for DSP’s advanced predicate mechanism in BWDSP. The two predicate forms in BWDSP are comparatively analyzed and their respective advantages as well as their scopes of application are given. Characteristics of the two predicate forms are deeply investigated and compilation representation method is offered. Partial predicate framework for cluster structure based on loop unrolling is proposed.The dissertation discovers the optimization techniques brought by the classic FFT algorithm in digital signal field on BWDSP platform. An innovative FFT optimization algorithm for data-parallel is presented based on partial bit reverse.
Keywords/Search Tags:Compiling optimization, Open source compiling infrastructure, Data distribution optimization, Clustering, Vectorization, Modulo scheduling, Predicate optization, FFT, Partial bit reverse
PDF Full Text Request
Related items