With the continuous development of large-scale integrated circuits and microelectronic technique,vector DSPs that integrate VLIW,SIMD,multi-core and efficient storage technologies to provide parallelism have become typical representatives of high-performance DSPs and are favored by the field of high-performance computing.Matrix operation has always been the core problem in the field of high-performance computing,and the solution of linear equations is one of the basic problems of matrix operation.Whether there is an efficient algorithm library for solving linear equations will directly affect whether a processor can be well applied to high performance computing.Therefore,combining the architecture and hardware resources of vector DSP,the design and implementation of efficient symmetric positive definite linear equation solving algorithm is of great engineering practical value.In this thesis,we analyze in depth two core algorithms for solving symmetric positive definite linear equations: the Cholesky decomposition algorithm and the triangular system of equations solving algorithm,and study the vector processing algorithm for these two algorithms on the vector DSP represented by FT-M7002.The main contents and innovations of this thesis are summarized as follows:Direct Cholesky decomposition and block Cholesky decomposition algorithms based on vector DSP single core are designed and implemented.By generating the upper triangular matrix instead of the lower triangular matrix,the problems of non-consecutive access to the matrix(column access)and communication between a large number of SIMD units(reduction summation)are solved;Through vector shuffling,a single element in a vector variable is copied to all VPEs to participate in the operation,which solves the problem of repeated access to the same element in multiple loops;By analyzing the data access law between two adjacent iterations of the outer loop,the loop for diagonal element update in direct Cholesky decomposition is eliminated;By analyzing the changes in the scale of sub-matrices involved in the calculation,a block processing method is designed for medium-sized matrices;the DMA double-buffer transmission strategy is designed by analyzing the update method of large-scale matrix,which realizes calculation and data transmission in parallel.The experimental results show that,compared with the corresponding library functions of TI’s TMS320C6678 processor,the block Cholesky decomposition achieves a speedup of 0.71~3.11,and the direct Cholesky decomposition achieves a speedup of 2.98~7.53.The following triangular equation system are solved as an example,and the triangular equation solving algorithm is designed and implemented on the vector DSP single core.Analyzing the basic principles of solving the lower triangular equation system,two serial algorithms for solving the lower triangular equation system are given.The method of dispatching the computation of multiple outer loops of subtasks to different VPEs for parallel execution eliminates the non-consecutive access to the matrix by the two serial algorithms;Through vector shuffling,a single element in a vector variable is copied to all VPEs to participate in the operation,which solves the problem of repeated access to the same element in multiple loops;By analyzing the update methods of these two algorithms to the matrix,a DMA double-buffer transmission method is designed,which realizes the parallel calculation and data transmission.The experimental results show that,compared with the c code compiled and optimized by CCS5.5.0 on the TI platform,the vector optimized c code compiled by FTM7002 IDE achieves a speedup ratio of 24.7~29.36. |