| With the increasing demands for embedded processor in China, high-performancerequirements for theembedded processor performance are being proposed in all fields, which dirves the research on reasearch ofa high-performance embedded processor. L32is one of our self-developed32-bit embedded processors,which enjoy a flexible handling capacity, such as double word (32-bit), word (16-bit), byte (8-bit) and bit(1-bit). L32is better than general processors that operation results can only be stored in the accumulator,and its operation results can be directly stored in a register or a RAM unit. However, there’s still muchspace to improve it. For the problem of poor efficiency to instruction execution, adder, dynamic pipelineand dependency problems have been used as the study in this paper and the following main tasks have beencompleted:Improvement of the adder. For all arithmetic operation of L32embedded processor, the same adder isadopted for8-bit,16-bit and32-bit addition, which results in the usage of two clock cycles, thus reducingthe speed of8-bit addition. The adder has been divided into two parts.8-bit arithmetic operations areexecuted in the first part, and the execution-time only required one clock cycle.16arithmetic operations aredivided into two cases, if carry-chain does not be generated by the low8bit operations,16arithmeticoperations are executed in the first part, and the execution-time also required one clock cycle. In anothercase,16arithmetic operations are achieved in the two parts. Experiment results has proved thatexecution-time which8-bit and16-bit addition operations are executed in new adder is less one clock cyclethan original one.Dynamic Pipeline Design. For adopting three-stage pipelines, execution time of each instruction isdifferent, the fast instructions execute in three clock cycles, while the slow ones require six clock cycles. So,the pipeline is blocked, and its throughput is poor. For the above problems, the instruction execution-timeand architecture of L32processor are detailed in this paper, and six-stage dynamic pipeline is designed, andthe former execution stage was divided into four stages. A flow-register is designed to control theinstruction to go through the necessary stage, while bypassing the useless pipeline stage. Experiment resultsshow that the six-stage dynamic pipeline processor raised63.2%than one of the original L32three-stageprocessor.Analysis of pipeline-related problems. To solve the resource conflict shown in six-stage dynamicpipelines, the method of buffering the conflicted instruction is designed to improve the throughput ofpipelines and reduce blocking. To solve the control dependency problems, the static branch prediction technology is proposed in this paper. To solve the data dependency problems, bypass technology anddelayed transfer technology are used in this paper. Experiment results demonstrate that the waysimplemented in this paper effectively solves the pipeline interdependency. |