Font Size: a A A

Research On Convolution Operation Acceleration And Its Application In Image Compression

Posted on:2024-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:X B ZhangFull Text:PDF
GTID:2568307091465844Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Image compression algorithms are techniques that compress large amounts of image data into smaller storage spaces,and they have wideranging applications.Traditional image compression algorithms,such as JPEG and PNG,are mainly based on mathematical methods such as discrete cosine transform and discrete wavelet transform,but these methods have certain limitations in terms of compression rate and image quality.In recent years,image compression algorithms based on convolutional neural networks have received much attention.By utilizing the feature extraction ability of convolutional neural networks,these algorithms can achieve higher compression rates while obtaining better image quality.The importance of image compression algorithms based on convolutional neural networks lies in their high compression rate,good image quality,and wide applicability.However,image compression algorithms based on convolutional neural networks also have some problems:high training costs,high model complexity,large data volume,and instability.This article proposes a large-scale image compression ASIC based on low latency high-speed adders,multipliers,and deep separable convolutions to address the issues of large data volume,slow computation speed,and complex traditional methods in image compression.The main tasks are as follows:This article proposes a large-scale image compression ASIC based on low-latency high-speed adders,multipliers,and depth-wise separable convolutions to address issues such as large data volume,slow processing speed,and complexity associated with traditional image compression methods.The main contributions of this work are as follows:(1)The basic principles of image compression using three layers of depth-wise separable convolutions are studied and analyzed.Verification is performed using Python language with Keras and Tensorflow frameworks.Compared with three traditional convolutions,three layers of depth-wise separable convolutions reduce the parameter and computational complexity by approximately 85% while maintaining similar compression performance.By writing Python code without relying on existing mature frameworks such as Keras and Tensorflow,the forward inference process(i.e.,compression)of the designed neural network is completed.Finally,a three-layer autoencoder based on depth-wise separable convolutional neural network is designed for 256 × 256 image compression,achieving a compression ratio of 2:1.(2)A 16-bit low-latency high-speed Sklansky adder and a 16-bit lowlatency high-speed Wallace Tree multiplier based on Sklansky adders are designed.(3)The hardware design of the forward inference process of the depthwise separable convolutional neural network is completed using a top-down design approach that includes row-column serialization,convolutional kernel parallelism,channel inter-parallelism,and inter-layer pipelining.This simplifies the complexity of implementing depth-wise separable convolutions in hardware,reduces resource usage and power consumption,and achieves channel-parallel design based on cached input data and interlayer cache design based on unidirectional shift registers.(4)The front-end design,back-end design,simulation verification,and FPGA board-level verification of a digital chip design are completed.The correctness and accuracy of compression and restoration are verified first using the test platform and actual tests.The designed low-latency high-speed adder and multiplier reduce the processing time by 28.99% and 29.82%,respectively,compared to Synopsys’ s IP.For the ASIC based on the threelayer autoencoder using depth-wise separable convolutions for large-scale image compression,the CPU takes 2.1222 seconds to perform a forward inference of a color image.The hardware acceleration result is 196608 clock cycles,corresponding to 508 images per second(508 FPS)at a clock rate of100 MHz.
Keywords/Search Tags:Depthwise Separatable convolution Neural network, adder, multiplier, hardware accelerator
PDF Full Text Request
Related items