Font Size: a A A

Research On VLIW Compile Optimization Technique For Compute-intensive Embedded Application

Posted on:2013-10-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:M L GuanFull Text:PDF
GTID:1268330422474317Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of science and technology, the computing needsof applications are also increasing. Different from the traditional desktop applications,the compute-intensive application is becoming the main load of the microprocessor inmany fields, such as scientific research, defense, business, and entertainment, etc..Meanwhile it is increasingly attracting people’s attention. Because of its strongbackground of application requirements, the development of high-end embeddedcomputing has always been very rapid. Now the demand of computing performance andpower brought from the embedded applications has already exceeded the capacity of thecurrent embedded processor. As an effective method of exploiting parallelism, VLIWtechnology still plays a very important role in the current microprocessor design. TheVLIW technology is propitious for the chip to reduce the hardware complexity, enhancethe frequency of the chip and reduce the power consumption. Meanwhile, it also poses asevere challenge to the compiler that the performance of the processor is moredependent on the performance of the compiler. However, with the continuousinnovation of the VLIW processor architecture and the unceasingly expand ofapplication domain, how can the compiler take full advantage of the architecture onperformance, power and other advantages, and exploit instruction level parallelism asfar as possible? For different microprocessor architectures, the compiler faces differentproblems. In this context, this dissertation focuses on the VLIW compile optimizationtechnique research for compute-intensive embedded application. This dissertationfocuses on several key issues when the compute-intensive embedded applications arerunning on the VLIW processor, including software integration of data level parallelismmulti-thread on VLIW processor, load balanced instruction scheduling for distributedregister file, partly connectivity shared interconnect architecture design and instructionscheduling for stream architecture, compiler optimization techniques forenergy-efficient microprocessor and so on. The dissertation has completed the followingmain contributions and innovations:1. We present a novel approach which integrates lightweight data-level parallelismthreads through compilation for VLIW processors. The multi-threading program of theOpenCL specification is data-level parallelism multithreading, while the load of a singlethread is lighter and it cannot give full play to the VLIW processor. The softwareintegration and parallel execution of the multi-threaded programs under the OpenCLspecification on the cluster of MASA stream processor is carried out in this dissertation.According to the characteristics that the data-level parallel threads have the samecontrol structure, the compiler merges the operations in corresponding basic blocks ofdifferent threads into one basic block to expand the instruction window that the compiler can schedule. It can transform data-level parallelism into instruction-levelparallelism and make the performance of the VLIW processor into full play. Theexperimental results show that the integration and execution of the appropriate numberof threads can effectively improve the performance of program, while the demands ofprocessor hardware resources are controlled in an acceptable range.2. We present the register file load balanced VLIW scheduling (RFBLS) fordistributed register files. The load imbalance of distributed register files makes theregister file cannot be effectively used. High peak register demand of the register filesoften lead to overflow, thereby reducing the performance and weakening its advantages.This dissertation presents the register file load balanced VLIW scheduling for processorwith distributed register file structure. Through analyzing the control structure of theprogram and the producer-consumer relationship of the variable, this dissertationpresents the method of exactly calculating the life time of the variables duringinstruction scheduling. Through the exact calculation of the load of the register filesafter each step of the instruction scheduling, the variables are assigned to the registerfile with lighter pressure firstly, thus balancing the pressure among different registerfiles. The experimental results show that this method can effectively reduce the peakdemand of the program on the distributed register file and reduce the overflow andmemory access.3. We design partly connectivity shared interconnect architecture for streamarchitecture and present instruction scheduling optimization algorithm for the partlyconnectivity shared interconnect architecture. In stream processor, a large number offunctional units and the full cross-interconnect structure makes the size of the sharedinterconnect bus very large, increasing the overhead of hardware resources,transmission delay and the difficulty of the hardware layout. Based on the analysis ofprogram characteristics, this dissertation reduces the size of the shared interconnect busthrough designing partly connectivity shared interconnect and the technology of I/O unitmultiplexing. At the same time, it weakens the influence the partly connectivity sharedinterconnect brought to the program performance though the compiler optimizedscheduling so as to use the existing interconnection resources as much as possible. Theexperimental results show that the compiler optimization scheduling is effective toavoid the sharp decline in the program performance; the utilization of the internetresource has been improved tremendously; the design of partly connectivity sharedinterconnect can reduce the hardware cost and energy consumption of processoreffectively.4. We present the variable classification scheduling algorithms for distributed andhierarchical register file(DHRF) structure and design the Thread level compiler for theenergy-efficient micro-thread processor. This dissertation proposes the embeddedtera-scale processor research, introduces the bottom level micro-thread processor architecture and the Thread-level programming model, and designs the Thread-levelcompiler for the micro-thread processor. In order to reduce the power consumption, themicro-thread processor employs the distributed and hierarchical register file structure.Because of the small capacity of TORF, many data need to be stored in ERF and thismakes the instruction scheduling for processor with DHRF much more difficult. Basedon the analysis of program characteristics, this dissertation presents the variableclassification scheduling algorithms for distributed and hierarchical register filestructure, avoiding the problem of the programmer’s manually allocation andoptimization. The experimental results show that, compared to the distributed registerfile structure, in the condition of slightly reduction of the program’s performance, thevariable classification scheduling algorithm significantly reduces the energyconsumption of register accessing and the entire energy consumption of the processor,which enables the power consumption advantages of the distributed and hierarchicalregister file structure can be fully tapped.
Keywords/Search Tags:Embeded Computing, Stream Processor, Shared InterconnectArchitecture, Distributed Register File, Distributed and Hierarchical RegisterFile, Load Banlance, Variable Classification Scheduling
PDF Full Text Request
Related items