| Modern processors are adding more and more SIMD vector instructions to increase their data-parallel processing ability. Shenwei, a multi-core processor, developed by Wuxi Jiangnan Computing Technology Institute, also equips with a SIMD processing unit. However, this custom-made processor lacks applications, especially 3D visualization applications. A efficient renderer is essential to 3D visualization application. We developed one for this processor in order to promote the development of 3D applications on custom-made processors.This paper presents in detail the architecture of a software renderer designed for multi-core processors with SIMD instructions. Traditional rendering pipeline is hard to parallelize. In this paper we designed a parallel rendering pipeline which could utilize thread-level parallelism and data-level parallelism efficiently. We also designed a rasterization method that suits SIMD instructions very well. According to these designs, we implemented a platform-independent rendering architecture. We also implemented renderers under this architecture on both Windows with Intel Core i7 and Linux with Shenwei processor. The results show that the renderer has near-linear scalability on these two platforms. And our tests show that, on Windows with Intel Core i7, our renderer is 20x faster than Mesa3D’s serial pipeline Mesa+Gallium+Softpipe. And it’s also 2.1-3.3x faster than Mesa3D’s mutithreaded pipeline Mesa+Swrast. The renderer utilizes binning to reduce bandwidth requirement for memory, as well as reduce lock contentions. The renderer could now shade four fragments in parallel using x86’s SSE instructions or Shenwei’s SIMD instructions. The renderer also use math routines optimized for SIMD instructions to do all 3D transformations. By utilizing both multiple cores and SIMD processing units, the renderer could deliver much better performance than a serial implementation. In addition, the renderer supports many features. Users could customize their vertex and fragment shaders, render to different targets. The renderer also supports culling, polygon clipping, z-buffer and perspective correct texture mapping. Thanks to the flexibility of software implementation, the renderer has a platform-independent rendering framework. And it is highly portable. The renderer is initially developed on Windows with Intel Core i7, and later ported to Linux with Shenwei processor with a piece of cake. |