Embedded GPU Hardware Accelerated Rendering Textures Transport Optimization And Prefetching Strategy Research

Posted on:2014-05-03

Degree:Master

Type:Thesis

Country:China

Candidate:H Q Wang

Full Text:PDF

GTID:2268330425983750

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Embedded3D graphics rendering technology has been applied in manyapplication fields, such as entertainment, medical, mobile devices, aerospace and soon. VxWorks embedded real-time systems is widely used, but WindML in this systemonly provides two-dimensional component of graphics development. On this basis，theresearchers have designed and implemented three-dimensional components andobtained certain achievements, but the CPU usage is too high. With more and morepowerful embedded graphics chips are developed, transplanting three-dimensionaltechnology of hardware accelerated on the general-purpose computer to the field ofembedded systems becomes a trend.This paper mainly studies the development of3Dgraphics bottom driver in VxWorks system. Among them, texture transmissionprocess is introduced in detail, in order to improve the efficiency of graphicsrendering and doesn’t interfere with multitasking run normally, two aspects of theoptimization method is proposed.Firstly, the paper proposes the optimization method of texture transmission. Afterstudied the GPU architecture and work principle of the OpenGL graphics driver, thispaper takes texture transmission as the breakthrough point, and introduces the addressspace of texture mapping process and the interaction model between host and graphicsadapter in detail. Original interaction mechanism bases on the wait queue. Themechanism relies on the system clock cycles with large time delay and uncertainty,resulte in the transfer rate jitter. So this paper proposals that two optimizationmethods can be used in interacting. After the order of DMA transmission is submitted,the first step is waiting for a fixed delay according to transfer granularity size andentering into busy waiting，the second is waiting for the end signal of transmissionwith the help of auxiliary clock after waiting for fixed delay. Test results show thatthe two methods overcome the problem of jitter, and transmission rate is improvedobviously when particle size is small.Secondly, after testing graphics rendering task in a multitasking parallelenvironment, it is found that the I/O intensive tasks is restricted during the texturetransmission operation, thus this paper proposals to prefetch data sets to improve CPUutilization for the I/O intensive tasks. Linux read_around prefetching algorithm isused in prefetching.Finally, the experimental verification of texture transmission and prefetching optimization method is effective. On the one hand,this paper tests time-lapse of themaximum transmission rate under different transmission granularity and obtains thetime of the most appropriate delay. Experiments show that choosing a appropriatedelay time can improve the transmission rate. On the other hand, this paper adopts theembedded performance benchmark program of Mibench, and tests the I/O intensiveand CPU intensive tasks respectively. Experiments show that prefetching data sets ofthe I/O intensive can improve certain amount of CPU utilization.

Keywords/Search Tags:

3D hardware acceleration, prefetching, Texture transmission

PDF Full Text Request

Related items

1	Research On The Hardware Acceleration Mechanism For SDN/NFV
2	Research On Geometric Texture Synthesis Algorithm Based On GPU Acceleration
3	Hardware Acceleration Design Technology For High Density Computing Many-core
4	Compiler-assisted hardware-based data prefetching for next generation processors
5	Research On Key Technologies Of RNN Algorithms Optimization And Hardware Acceleration
6	Web Prefetching And Caching The Integrated Model
7	Research On Web Prefetching Model Based On Double Dependency Graph
8	Study On Depth Image Based Rendering Algorithms
9	Research On Hardware Acceleration Algorithm For Target Recognition
10	Research Of Hardware Acceleration Technique For Critical Algorithms Of DSP Applications