| LU factorization operation is a classical algorithm of intensive operation,which has been at the core because of its wide application and significant value.However,when LU factorization algorithm is implemented on a SIMD DSP hardware platform,it still faces problems such as insufficient use of hardware computing unit and data transmission bus,memory space access conflict,etc.,all of which may cause a poor performance of the algorightm on this platform.Therefore,further study should be done to improve the LU factorization performance on the hardware platform,one of the most effective methods is by using software optimzation.In view of the above mentioned problems,this thesis studies a LU factorization algorithm based on the domestic SIMD architecture digital signal processing chip BWDSP1042.Results show that the precision and realtime performance of the algorithm are improved significantly.This thesis first introduces the core structure,pipeline,memory space allocation and instruction system of the BWDSP1042 processor.Secondly,we describe the LU factorization function designing process of C language version,and the main framework and operating environment of the algorithm are constructed.Finally,the assembly version LU factorization algorithm based on BWDWP1042 is studied,which eliminates the discontinuous memory access during the matrix multiplication operation,makes full use of hardware computing resources and data transmission bus,and accelerates the communication between tasks in the loop through software optimization.The memory access delay and memory access conflict caused by communication are reduced,and the performance of the LU factorization algorithm is further improved.This thesis gives the detailed process of LU factorization algorithm research and compares it with the running cycle and running time of the internal function library of mainstream high-performance DSP chip TMS320C6678.When the test cases are relatively comprehensive,the reliability and accuracy of the two versions of the function are tested to ensure the correctness and reliability of the function.The simulation and experimental results show that the LU factorization function implemented on the BWDSP1042 platform makes full use of the parallelism of the mining features of the SIMD architecture.Compared with the serial version of the C function,the assembly version function improves the performance dramatically.The efficiency is respectively increased by 26.75 times,34.61 times and 42.95 times when the matrix points are 32 × 32,64 × 64 and 128 × 128.In addition,compared with TMS320C6678,when the matrix points are 128 × 128,the running time ratio is close to the core frequency ratio.The error of the test results between the C version function and the assembly version function is less than or equal to the order of magnitude of 10-7 much better than the order of magnitude of 10-4 required by the function library design indicators.This function satisfies the engineering requirements for high stability,high precision and high performance of the function library in the real-time signal processing field of radar. |