Font Size: a A A

GPU Based Image Compression And Encoding

Posted on:2010-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:M B LiFull Text:PDF
GTID:2178360272496237Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
In recent years, with the number of computers and the rapid development of communication technologies, especially network and the rise of multimedia technology, the need for storage, transmission and processing of image information into the data increase exponentially, image compression technology has been an increasing number of concerns. Because the huge volume of image data and image transmission and storage there are serious problems, causing the computer to access and exchange of their processing speed greatly reduced, thus the need to guarantee image quality as much as possible under the premise of the image data compression. Although the hardware technology in recent decades led to the development of the processing speed of CPU has been rapidly rising, but in many areas of advanced applications, the requirements of the processing speed can not be met.Therefore, enhancing the speed of image processing is an enormous challenge, which is characteristic of the image data and image processing algorithms resulting from the complexity. Limited standalone technology, the computer determines the inevitable development of multi-path parallel to the road, so the future of parallel computing will become the mainstream model. With the high-performance parallel processing systems, parallel image processing technology to improve the speed of image processing provides greater space for development. Parallel processing of image in image processing technology is an important aspect of the development level of its image has been widespread concern in areas of research, the reasons are: on the one hand, the image development of parallel processing technology is very difficult, the difficulty lies not only in the image parallel processing system hardware and system architecture and its own computer technology and integrated circuit technologies such as dependence, but also the complexity of the practical application and application departments of the bearing capacity of the system price; On the other hand, parallel processing of images generated by the development of efficiency is very significant, it was on the processing speed of the speed-up is exciting, its practical application system will also have a significant economic and social benefits. Thus, parallel processing algorithm for images of great significance and value.Programmable graphics processor (Programmable Graphic Process Unit, PGPU) is widely used by computer graphics image processing dedicated devices, it has a single instruction stream multiple data stream (SIMD) parallel processing features, but also provides full support for vector operations instructions and IEEE32-bit floatingpoint format in line with the vertex processing and pixel processing power has become a powerful parallel computing unit. CPU when compared with, GPU has the following advantages: a powerful parallel processing capabilities and highly efficient data transmission capacity. Among them, the parallel of the main embodiment of the instruction-level, data-level and three levels of task-level. Efficient data transmission is mainly reflected in two aspects: GPU and the bandwidth between memory: 16GB/s; system memory to the memory bandwidth for: 4GB/s.Provides a modern GPU vertex processor and fragment processor programmable parallel processing of two parts. In the use of GPU image processing, such as the implementation of general purpose computing tasks, the main work to be done is to solve the tasks to be mapped to the GPU support on the graphics rendering pipeline. The usual method of computing tasks is to use the input data vertex position, color, method or vector attributes such as texture mapping and other graphical elements to express, and the corresponding processing algorithms are decomposed into a series of steps, and rewritten for the GPU's vertex or fragment processing procedures, and then, call Rendering 3D API the implementation of operations, procedures call fragment; Finally, stored in the frame buffer in the rendering result is that the output data algorithm.This paper introduced a novel approach on JPEG image compression for remote visualization systems. These systems need to provide fast compression and possibly no visual quality loss to provide a remote interactive interface for IS/VR applications. Therefore we proposed to utilize the powerful GPUs to compute the compression. By parallelizing a well known image compression algorithm (JPEG) and adapting it to the special GPU hardware, substantial performance increases and significant load relieving of the CPU could be achieved. Additionally, through outsourcing the image compression to the GPU and thereby avoiding the need to readback large uncompressed frames through the limited host interface, a further source of latency could be eliminated. All these improvements help to achieve the goal of providing a seamless graphical remote interface without restrictions in latency or quality. The second part of the paper describes the benchmarking and the quality assessment of three different JPEG implementations (LibJPEG, the proposed CUDA-based JPEG and TurboJPEG). The performance comparison of the three JPEG variants shows that the highly optimized TurboJPEG implementation is currently the fastest way to compress JPEG-conform images. It uses the Intel Performance Primitives library and thus can fully harvest the power of the Intel dual-core processor built in the testing machine. The CUDA-based JPEG implementation introduced in this paper, however, partially relies on the unoptimized code of the LibJPEG implementation. Especially the sequential Huffman encoding which is used by this implementation seems to be limiting the performance for high resolutions and multicolored frames. The time spend on the GPU for doing the compute intensive steps (color conversion, downsampling, DCT and quantization) is less than the overall time used by the fast TurboJPEG implementation for resolutions of 1024x768 and higher. This is a good perspective to further optimize the CUDA-based implementation and design a CUDA-supported Huffman encoding to outperform the highly optimized TurboJPEG implementation. In addition, the GPU-based methods strongly relief the CPU and leave it available for other tasks. The LibJPEG implementation is by far the slowest but most feature-rich and universal implementation. Anyway, it is not really suited for the application in remote visualization of IS/VR since it limits the overall system performance for high resolutions by its slow compression times.The main finding of the quality measurement is that all three JPEG implementations produce very high quality compressed images, even with low JPEG quality settings. However, to ensure that the resulting frames permanently have SSIM indices above 90% q needs to be set to 75 or higher. It is planned to extend the Invire system by an automatic quality assessment component, which dynamically measures the quality of the compressed frames and adapts q to the results of this measurement. This either helps to save bandwidth or to provide the users with optimal quality.
Keywords/Search Tags:GPU, image compression, DCT transform, parallel algorithm
PDF Full Text Request
Related items