Font Size: a A A

Embedded Processor Microarchitecture Optimization

Posted on:2014-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2268330425981401Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The improvement of processing technology and the requirement of new application promote the rapid increase in processor performance. However, embedded processor faces new challenges:on one hand, the performance gap between memory and processor restricts the performance of processor system; on the other hand, a large number of high-precision floating-point applications ask for new design requirement. After the analysis of application characteristics, we optimize the memory sub-system by data prefetching, and design a floating-point unit to accelerate the data processing.The design and configuration of mainstream data prefetching mechanism does not apply to embedded processors:an overly aggressive prefetching strategy will interfere the normal memory accesses; and complex prediction and control mechanism will consume a lot of power and area. This thesis proposes a filtered stream prefetcher with non-unit strides for embedded processors. We allocate and filter the data stream by optimized minimum delta algorithm to reduce the complexity of circuit design; and set a prefetch buffer to reduce the collision rate of cache ports; and optimize the cache replacement policy to make up the negative impact caused by cache pollution. The emulation result on NoCOP platform shows that, for EEMBC and SPEC2006benchmark sets, the speedup is4.3%in average and16%in maximum compared to no prefetching, and10.5%compared to MSP(minimum delta prefetching) mechanism. The increasing area is about35,000gates, and increasing power is30.1mW.Most prefetching mechanism cannot prefetch both stream and linked data structure, and there is large storage consumption or low prefetching accuracy in current pointer prefetching mechanisms. This thesis proposes an adaptive muti-mode prefetching system, which integrates stream prefetcher and pointer prefetcher. The prefetching system can adaptively switch the prefetching mode among stream prefetching, pointer prefetching and non-prefetching, according to processor run-time information. The FCDP(filtered content directed prefetching) can reduce35%prefetching requests in average compared to CDP(content directed prefetching), by using a filtering method based on load offset address. The emulation result on NoCOP platform shows that, for EEMBC, SPEC2006and Olden benchmark sets, the speedup of multi-mode prefetching system is11.7%compared to stream prefetching mode, and50.6%compared to FCDP mode. The system can shut down to reduce power consumption when prefetchers are not efficient.According to the case that more and higher precision floating-point data in current applications, a floating-point unit is designed for embedded processor to accelerate the data processing. We statistic the application characteristics by software simulator to guide the RTL-level(register transfer level) design. The design of floating-point unit handled the load/store and arithmetic instructions separately. The floating-point unit highly reuses the logic unit of integer pipeline, and tightly coupled with integer pipeline. The simulation and logic synthesis result shows that the floating-point unit fully supports MIPS32single-precision floating-point instruction set. The maximum frequency is495MHz in worst case and794MHz in typical case. The increasing area is about248,000gates, and increasing power is88.3mW.
Keywords/Search Tags:embedded system, micro processor, data prefetching, adaptive system, floating-point unit
PDF Full Text Request
Related items