Font Size: a A A

Design And Implementation Of Low-Power Hardware Accelerators Based On CORDIC

Posted on:2017-12-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:J F ZhangFull Text:PDF
GTID:1318330536967127Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of integrated circuit(IC),the digital communication technology is updating,which leads to the demand of high performance communication devices.Meanwhile,to meet the different needs of users,communication standards are working together.Therefore,the design period and the cost of communication devices become longer and more expensive.As the typical signal processing device in communication systems,the real-time requirement of digital signal processor(DSP)is becoming more urgent,such as QR decomposition,Discrete Cosine Transform(DCT),Inverse Discrete Cosine Transform(IDCT),Direct Digital Frequency Synthesizer(DDS)and Fast Fourier Transform(FFT).To satisfy real-time processing of these algorithms,people prefer to use hardware accelerators,which are placed in DSP.Moreover,the trend that the power restricts the performance of devices becomes more unbearable,especially for the portable communication devices.Based on the advantages that Coordinate Rotation Digital Computer(CORDIC)can use shift and addition operations to execute transcendental functions,including multiplication,division,log function,trigonometric functions,square root and so on,CORDIC has been well applied in many applications.Hence,how to design hardware acceleration and low-power of the mentioned algorithms in DSP based on CORDIC has become the topic research work in communication area.To improve the accuracy,save the hardware overhead,speed up the process and reduce the power,this dissertation does research work on the hardware acceleration and low-power techniques for CORDIC,QR decomposition,DCT,IDCT,DDS and FFT.The major contributions and innovations can be summarized as follows:1)As the conventional CORDIC has some drawbacks,including the limited convergence range of the rotation angle,the excessive number of iterations,the scaling factor compensation,the angle data path and the poor precision,this dissertation proposes Adaptive Recoding CORDIC(ARC)based on Scaling-free(SF)CORDIC,which adopts adaptive recoding method to make the every two bits in the 10 least significant bits(LSBs)of the rotation angle have at most one bit set to 1,which reduces the required iterations of every two bits from two to one.To further accelerate the process,this dissertation proposes Enhanced Adaptive Recoding CORDIC(EARC)based on ARC,which uses enhanced adaptive recoding method to make the 2 LSBs in the 3 most significant bits(MSBs)of the rotation angle have at most one bit set to 1.In the meantime,this dissertation proposes a pipeline-balancing enhanced vectoring CORDIC,which adopts trigonometric transformation to eliminate the first iteration and uses the adoptive recoding method to accelerate the process of scale factor compensation.The results demonstrate that EARC improves the bit error position(BEP)by four bits,reduces area 17.7%,provides a 1.23-fold speed-up and consumes 17%less power than the latest Hybrid CORDIC.Compared with ARC,EARC reduces the hardware resources by 8.56%,achieves an 11.1% lower processing time and dissipates 12.3% less power.Compared to a commercial implementation of vectoring CORDIC,the enhanced vectoring CORDIC is presented that uses 66.3% less hardware resources,provides a 1.8 times speed-up and dissipates 71.6% less power while maintaining the same computation accuracy.Therefore,the proposed ARC,EARC and the enhanced vectoring CORDIC have better performance,especially in speed and power.2)To improve the performance of the architecture for QR decomposition,this dissertation proposes the Zhang architecture by adding one more rotation CORDIC in the second row of the Chen architecture,which reduces the setup time of the pipeline.To further speed up the updating time,this dissertation proposes the Enhanced architecture by adding one more rotation CORDIC in the third row of the Zhang architecture.In the meantime,an efficient control scheme is introduced to make the rotation CORDICs work together with the vectoring CORDICs to eliminate the waiting time of the calculated angle of the vectoring CORDICs.The Zhang architecture based on the enhanced vectoring CORDIC saves 5% in hardware and the throughput is improved by a 2.28-fold with no accuracy penalty when compared with the Chen architecture.The Enhanced architecture exhibits better performance than the Zhang architecture,including 1.5 times speedup in throughput,a factor of 1.45-fold in hardware efficiency and dissipates 24.5% less power at the expense of using 3.65% more hardware resources.Especially,when choosing the worst case 100% toggle rate for the Enhanced architecture and the best case 12.5% toggle rate for the Zhang architecture,the Enhanced architecture still consumes less power.The results demonstrate that the throughput and hardware efficiency of the Zhang and the Enhanced architectures are improved significantly,and both of the two proposed architectures can work well in low-power systems.3)To reduce the computing complexity of DCT and IDCT,this dissertation uses the partitioning scheme to get the First unified DCT/IDCT Architecture,which can be re-alized by vector rotations.Considering the applications under limited resources,this dissertation proposes the Second unified DCT/IDCT Architecture based on the First unified DCT/IDCT Architecture by using the trigonometric transformation.Moreover,ARC and the conventional CORDIC are combined together to design the rotation elements to improve the accuracy and decouple the data dependence.Meanwhile,an efficient adder and shifter-based radix-2 scale factor approximation is also introduced to reduce the truncation error.Under DCT mode,compared with the Lee architecture,both of the First and the Second proposed architectures improve Peak Signal-to-Noise ratio(PSNR)by over 3.8 d B,save over 10.8% in area,reduce the critical path delay over 5.06% and dissipate over 9.18% less power.For DCT/IDCT mode,compared with the Huang architecture,the First and the Second proposed architectures improve the PSNR by 1 d B and 2 d B,use 38.3% and 42.8% less hardware resources and dissipate 35.4% and 34.4% less power respectively.In addition,the First proposed architecture also improves the critical path delay and the throughput.Therefore,the two proposed unified DCT/IDCT architectures exhibit better performance in both modes.4)As the phase-to-amplitude function in DDS needs to calculate the trigonometric values in real-time and FFT should execute the complex multiplications in the butterfly elements as soon as possible,this dissertation proposes efficient DDSs and FFTs based on ARC and EARC,which eliminate the RAMs to store the trigonometric values and twiddle factor values with a slight accuracy loss in terms of SFDR and SNR,respectively.Compared with commercial DDS,the ARC DDS and the EARC DDS consume 17.7% and 27.2% less power,respectively.The FFT based on ARC provides a factor of 1.23-fold speed-up and consumes 60.1% less power when compared with the commercial FFT.Compared to ARC FFT,the FFT based on EARC has better performance.Hence,the proposed DDSs and FFTs based on ARC and EARC use less hardware resources and consume less power.
Keywords/Search Tags:Low-power, QR Decomposition, DCT/IDCT, DDS, FFT, CORDIC
PDF Full Text Request
Related items