With the wide spread of various mobile communication, entertainment and dataprocessing devices, more technique of3D render has been applied in embedded system.Although some simple3D render based on the software has been completed, thefunction still has shortages like occupying too much CPU resources, increasing CPUpower, rendering3D graphics with low quality and low speed, which lead to cannot bereadering in real time. For solving these issues, it is necessary to implement a hardwarecircuit for rendering3D graphcis with high performance.The researches focus on the mobile vertex processor design with low power, lowcost and high performance, and supporting standards of OpenGL ES1.X. The majorachievements are listed below:1. A programmable graphics vertex processor for mobile3D render is presented inthis paper. It supports a self-defined instruction of32bit which has branch instructions.It adopts fixed data-path to simple the complexity of circuit. And it has four parallelSIMD operation units, a high precision special function unit; seven pipeline stage withdata bypass and hazard control function. Experiment results show the processor hashigh speed and high precision features to support real-time3D graphics rendering.2. A high performance and low power fix-point Special Function Unit (SFU) formobile vertex processors is presented in this paper. The system supports the fix-pointformat for OpenGL ES1.X and implements16bit precision after the decimal point andfaithfully rounded reciprocal, square root, reciprocal square root, logarithm, andexponential functions. The functions are approximated by using a piecewise quadraticinterpolation technique. A square root2circuit is used in the unit, and the lookup tablesize is reduced by29%with respect to previously proposed techniques, without anyloss in accuracy. Based on analysis result of computer error and truncate error, thespeed and area of lookup table, square unit, multiplier and fused accumulation treereach optimal. The SFU has been implemented in a0.18μm CMOS technology. Thecircuit is able to operate up to300MHz clock frequency, with a power dissipation of12.8mW at300MHz and area only0.112mm2. The results show that the fixed-pointSFU is ideal for mobile vertex processors computing elementary functions.3. This thesis presents a programming language for designing signed multipliercircuit for SIMD design. The key idea is using instruction to express the encoding units, addition tree units and fast adder units of multiplier, and using the connection ofinstruction description to obtain a multiplier. The multiplier of program through Lexand Yacc translate source code containing connection into Verilog code. Seven typicalstructures of32bits signed multipliers are obtained by the instruction description.Under200MHz synthesis condition and in GRACE0.18μm process, these multipliersare run logic synthesis, placed and routed, static timing analysis, and power analysis.The experiment results suggest that the speeds of all the seven multipliers showadvantage over that produced by Synopsys Design Ware,and the multiplierperformance composed of modified Booth Radix4encoding, redundant binary additiontree and carry skip adder exceeds that produced by Synopsys Design Ware by35%.Therefore, this language can be used to the application of high performance multiplierdesign.4. A high performance clipping engine is describes in this thesis, which iscomposed of the back-face culling and view clipping of3D graphics. Complexity oftriangle clipping algorithm is caused difficulties to implement hardware. The proposedclipping algorithm that shows5times higher performance than a conventionalalgorithm. Firstly, the view clipping before perspective division is moved to the laststage in the3D graphics pipeline. The perspective division and viewport mapping of3D graphics is run in vertex processor, so the clipping engine circuit is simplified, andimproved the processor performance. Secondly, the triangles produced by vertexprocessor are processing in back-face culling then output to the clipping block, so thetriangles number of clipping block input is decreased mostly, the performance ofclipping engine is also increased. Finally, the clipping block only processes the Z planeclipping and intersection, but the X and Y plane clipping will calculate in the rasterizerstage of3D graphics. Therefore, eliminating redundant calculations of clipping block isdecreased the area of the circuits, and increased the though output of the clippingengine.5.The mobile vertex processor has been verified under SOPC system.3D graphicsrendering with OpenGL ES1.X is run in the system. The results show the vertexprocessor can complete the3D graphics rendering by4M verticel/s fillrate at60MHzoperating frequency.6. In this thesis,the design of mobile graphics vertex processor is integrated in amultimedia chip with ARM multi-cores architectures. The chip has been implementedin a0.13μm1P7M CMOS technology. The circuit is able to operate up to150MHzclock frequency with a power dissipation of226mW when3D functions are all enabled, and the circuit area is61.23mm2. The area of graphics vertex processoraccounted for10.83%, and the area of graphics rendering engine with fixed pipeline isabout34.385%. The vertex processor has10M vertices/s fillrate when all features(T&L, Projection, Division, ViewPort Map) are enabled, and76mW power dissipationat the ame time. The rendering engine has70M pixel/s and300M texels/s peakgraphics performance. The chip shows that graphics pipeline has5M trangles/sgraphics performance. Experiment results suggest that the chip can complete3Dgraphics rendering in real time, so it approves the vailidity for the proposed3Dgraphics model and mechanism. |