Embedded Processor Microarchitecture Optimization

Posted on:2014-01-31

Degree:Master

Type:Thesis

Country:China

Candidate:Y Liu

Full Text:PDF

GTID:2268330425981401

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

The improvement of processing technology and the requirement of new application promote the rapid increase in processor performance. However, embedded processor faces new challenges:on one hand, the performance gap between memory and processor restricts the performance of processor system; on the other hand, a large number of high-precision floating-point applications ask for new design requirement. After the analysis of application characteristics, we optimize the memory sub-system by data prefetching, and design a floating-point unit to accelerate the data processing.The design and configuration of mainstream data prefetching mechanism does not apply to embedded processors:an overly aggressive prefetching strategy will interfere the normal memory accesses; and complex prediction and control mechanism will consume a lot of power and area. This thesis proposes a filtered stream prefetcher with non-unit strides for embedded processors. We allocate and filter the data stream by optimized minimum delta algorithm to reduce the complexity of circuit design; and set a prefetch buffer to reduce the collision rate of cache ports; and optimize the cache replacement policy to make up the negative impact caused by cache pollution. The emulation result on NoCOP platform shows that, for EEMBC and SPEC2006benchmark sets, the speedup is4.3%in average and16%in maximum compared to no prefetching, and10.5%compared to MSP(minimum delta prefetching) mechanism. The increasing area is about35,000gates, and increasing power is30.1mW.Most prefetching mechanism cannot prefetch both stream and linked data structure, and there is large storage consumption or low prefetching accuracy in current pointer prefetching mechanisms. This thesis proposes an adaptive muti-mode prefetching system, which integrates stream prefetcher and pointer prefetcher. The prefetching system can adaptively switch the prefetching mode among stream prefetching, pointer prefetching and non-prefetching, according to processor run-time information. The FCDP(filtered content directed prefetching) can reduce35%prefetching requests in average compared to CDP(content directed prefetching), by using a filtering method based on load offset address. The emulation result on NoCOP platform shows that, for EEMBC, SPEC2006and Olden benchmark sets, the speedup of multi-mode prefetching system is11.7%compared to stream prefetching mode, and50.6%compared to FCDP mode. The system can shut down to reduce power consumption when prefetchers are not efficient.According to the case that more and higher precision floating-point data in current applications, a floating-point unit is designed for embedded processor to accelerate the data processing. We statistic the application characteristics by software simulator to guide the RTL-level(register transfer level) design. The design of floating-point unit handled the load/store and arithmetic instructions separately. The floating-point unit highly reuses the logic unit of integer pipeline, and tightly coupled with integer pipeline. The simulation and logic synthesis result shows that the floating-point unit fully supports MIPS32single-precision floating-point instruction set. The maximum frequency is495MHz in worst case and794MHz in typical case. The increasing area is about248,000gates, and increasing power is88.3mW.

Keywords/Search Tags:

embedded system, micro processor, data prefetching, adaptive system, floating-point unit

PDF Full Text Request

Related items

1	Implementation And Optimization Of High-Performance Floating-Point Unit In X Processor
2	The Design And Implementation Of Floating Point Unit Based On ARMv7 Floating Point Instruction Set
3	The Research And Implementation Of High Performance SIMD Floating-point Multiplication Accumulator Unit For FT-XDSP
4	Key Technologies Of VLSI Implementation Of High Performance Floating-point Arithmetic Unit
5	Analysis And Design Of High-performance Floating-Point Unit
6	Hardware Design And Implementation Of Floating-point Instruction Based On AltiVec
7	Verification Of Processor Floating-point Division/Square Root Unit Based On UVM
8	Research On High Performance Floating Point Unit
9	Research And Implementation Of Key Techniques Of High Performance Floating-Point Unit Designs
10	The Design Of. P1750a Floating-point Implementation Of The Components, And Achieve