Font Size: a A A

Conception de processeurs specialises pour le traitement video en temps reel par filtre local

Posted on:2011-12-20Degree:M.Sc.AType:Thesis
University:Ecole Polytechnique, Montreal (Canada)Candidate:Aubertin, PhilippeFull Text:PDF
GTID:2448390002969226Subject:Engineering
Abstract/Summary:
This master thesis explores the possibilities offered by Application-Specific Instruction-Set Processors (ASIP) for digital video applications, more specifically for a particular algorithm class used for video processing: local neighbourhood functions. For this algorithm class, an architectural exploration lead to the identification of a set of design techniques which, together, form a coherent and systematic approach for the design of high performance ASIPs usable for real-time video processing.;In order to demonstrate the validity of the proposed design approach experimentally, seven ASIPs have been designed by extending the instruction-set of a configurable and extensible processor. Three of the ASIPs implement intra-field deinterlacing algorithms, and four implement the 2D convolution with different kernel sizes.;The results show a significant improvement in performance. For the intra-field deinterlacing algorithms, speedup factors are between 95 and 1330, while the factors of improvement of the Area-Time (AT) product are between 29 and 243, all this compared to a pure software implementation running on a general-purpose processor. In the case of the two-dimensional convolution, speedup factors are between 36 and 80, while factors of improvement of the AT product are between 12 and 22. In all cases, real-time processing of high definition video in the 1080i (deinterlacing) or 1080p (convolution) format is possible given a 130 nm manufacturing process.;The proposed design approach aims at an efficient utilization of available bandwidth to memory, which constitutes the main performance bottleneck of the application. It is possible to approach the processing speed limit imposed by this bottleneck through an appropriate data reuse strategy and by exploiting the data parallelism inherent to the target algorithm class. The design approach comprises four steps: first, a Single Instruction Multiple Data (SIMD) instruction which calculates more than one pixel in parallel is created. Then, shift registers, which are used for intra-line input pixel reuse, are added. Next, a processing pipeline is created by the addition of application-specific registers. Finally, the custom load/store instructions are created. Some of these steps lead to possible hardware simplifications for some algorithms of the target class. The hardware structure thus obtained, together with the instruction-level parallelism made possible through the use of a Very Long Instruction Word (VLIW) architecture, mimics a pipelined systolic array.
Keywords/Search Tags:Video, Instruction, Possible
Related items