The Circuits Design And Optimization Of Large-Scale Matrix Inversion

Posted on:2020-09-16

Degree:Master

Type:Thesis

Country:China

Candidate:Z Z Chen

Full Text:PDF

GTID:2428330611954740

Subject:Integrated circuit engineering

Abstract/Summary:

PDF Full Text Request

Matrix computing is a basic problem in scientific computing and engineering computing,and is widely used in the field of signal processing.Among the matrix operations,matrix inversion occupies an important position.In the scenarios of massive MIMO systems,array signal processing and image signal processing,the computation speed of matrix inversion often becomes the key to affect system performance as the data size increases geometrically.Therefore,this thesis focuses on the large-scale matrix inversion problem,and a high-order real matrix inversion hardware acceleration circuit for single-precision floating-point numbers is designed to improve the throughput of matrix inversion.In this thesis,the matrix inversion algorithm based on Cholesky decomposition is selected by comparing the applicability,computational complexity and hardware implementation difficulty of various algorithms.In order to achieve high throughput,a pipeline parallel structure is adopted as the basic architecture.The matrix inversion algorithm based on Cholesky decomposition divides the matrix inversion process into three steps:Cholesky decomposition,lower triangular matrix inversion and triangular matrix multiplication.By analyzing the data dependence of the three steps,the Cholesky decomposition,the triangular matrix inversion and the triangular matrix multiplication are respectively performed on the fine-grained parallel task partitioning based on linear PE array.Based on this,the PE is designed and the usage of hardware resources is greatly reduced by optimizing the array structure.For the inversion of the lower triangular matrix,the calculation method of task assignment by column is proposed.The parallel execution of the lower triangular matrix inversion algorithm is realized by the conversion of calculation order,and the throughput is further improved by the design of the floating-point multiplier accumulator.The three-step modules are integrated to realize the large-scale matrix inversion system with the highest 5120 order,and the error is in the range of10^-7¹0^-4,which satisfies the basic requirements of signal processing.Finally,based on the algorithm execution process and circuit delay parameters,the performance verification model of the large-scale matrix inversion system is established.The performance analysis of the matrix inversion circuit is theoretically analyzed,and the performance analysis speed is accelerated.The Nexys Video platform is used by this thesis for verification.The results show that maximum clock frequency and throughput of the large-scale matrix inversion system can reach to 156 MHz and 8.2GFLOPS when 32 PEs are integrated in the chip of the FPGA.32⁵120 order matrix inversion is supported by the system.Compared with the similar single-precision floating-point high-order matrix inversion circuit,the throughput rate is improved by 4%when the hardware resources are saved by more than 20%.The circuit designed in this thesis satisfies the requirements of large-scale matrix inversion operation and has certain values of engineering application.

Keywords/Search Tags:

signal processing, matrix inversion, Cholesky decomposition, data dependency, PE, FPGA

PDF Full Text Request

Related items

1	Research On Hardware Acceleration Technology For The Matrix
2	FPGA-based Matrix Inversion IP Core Design Technology And Related Experi- Ment Platform Design
3	The FPGA Implementation Of Matrix Calculation In Signal Processing
4	Designs Of A ADBF Accelerator Based On FPGA
5	Design And Implementation Of Hardware Accelerator For I-vector Voiceprint Recognition Algorithm
6	A Study Of The Matrix Operation Harden Implementation On Fpga
7	Application specific precision analysis of Cholesky decomposition in MIMO receiver systems
8	Matrix Pen Decomposition Algorithm Research And Application In Communication Signal Processing
9	Hardware Implementation Of Sample Matrix Inversion Algorithm Based On FPGA
10	Optimized Implementation Of Signal Processing Module Based On CUDA