Font Size: a A A

Configurable accelerators for video analytics

Posted on:2012-03-17Degree:Ph.DType:Dissertation
University:The Pennsylvania State UniversityCandidate:DeBole, MichaelFull Text:PDF
GTID:1468390011958533Subject:Engineering
Abstract/Summary:
Video analytics is the science of analyzing image sequences and video with the aim to gain a cognitive understanding of a scene. The applications which can take advantage of video analytics are diverse, ranging from media measurement systems and surveillance, to medical imaging and traffic systems. Unfortunately, many of these algorithms can still not be deployed in embedded environments, or achieve real-time performance, because of the computational and size, weight, and power (SWaP) constraints of such systems. In particular, performing complex imaging tasks in real time are still beyond the capabilities of general CPUs and embedded microcontrollers alone. Alternatively, systems that have the ability to perform video analytics in real-time usually require high SWaPs that forbid their use within an embedded system. The goal of this dissertation is to explore several areas which have the potential for enabling low SWaP accelerators to meet the performance goals of real-time systems. These areas include low-cost field programmable gate arrays (FPGAs), graphics processing units (GPUs), three-dimensional (3D) integrated circuits (ICs), and flexible, high-performance FPGA systems which enable algorithm exploration.;FPGAs have become a highly competitive platform for implementing low-power systems aimed at real-time applications. This dissertation describes the implementation of two popular machine learning algorithms, the artificial neural network (ANN) and support vector machine (SVM), targeting embedded FPGA systems. These algorithms were chosen because they can have direct impacts on commercial applications where these algorithms are used extensively. Both implementations demonstrate the ability to perform at the 30 frames-per-second necessary to support real-time operation and can be configured to meet the resource constraints of the system. The second class of accelerator, the GPU, consists of tens to hundreds of functional units with an underlying hardware architecture that has been fixed. This dissertation examines a key algorithm towards recognizing salient features within an image, known as center-surround distribution distance. Through the use of a GPU platform, the algorithm was able to be accelerated by up to 30 times over an optimized CPU implementation, enabling the algorithms use for real-time applications. The third area, the 3D IC, targets an application specific integrated circuit (ASIC) design that has historically been the most efficient choice for accelerating custom applications, as they provide the highest performance at the lowest SWaP. This dissertation demonstrates the design and implementation of a custom accelerator chip using 3D technology targeted towards a complete embedded camera accelerator platform. The chip implements a popular pre-processing algorithm which extracts skin regions from an image and can operate at 312 frames-per-second (10X real-time performance). Lastly, this dissertation explores the Falcon framework, which allows high-performance FPGA systems to automatically be composed from an algorithmic specification. In particular, this dissertation addresses the difficulties which arise when trying to compose multi-FPGA based systems that are made up of several different types of IP cores. This level of configurability is enabled through the development of several key components including an underlying hardware infrastructure, intelligent mapping tool, and a graphical user interface (GUI), allowing a systems designer to build video analytics systems quickly and easily. The use of the Falcon framework is then demonstrated on a state-of-the art, biologically inspired algorithm, used for multi-class image recognition tasks, known as Hierarchical Model and X (HMAX). Compared to a CPU implementation, the FPGA system generated through the use of the tool garnered a speedup of 16X.;These studies demonstrate the benefits and disadvantages of several prevailing technologies valuable for creating real-time systems which are capable of analyzing images and video. Understanding these tradeoffs is important as the use of video and images is gaining importance in many commercial applications. The architectures, methods, and tools described within this dissertation are intended to enable image analytics to be used in wider variety of applications.
Keywords/Search Tags:Analytics, Image, FPGA systems, Dissertation, Applications, Accelerator, Real-time
Related items