Font Size: a A A

Design And Implementation Of QR Decomposition Based On Vector Processor

Posted on:2016-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q N LuFull Text:PDF
GTID:2348330509460897Subject:Software engineering
Abstract/Summary:PDF Full Text Request
QR decomposition algorithm,a main tool for digital signal processing,plays a significant role in high performance computing field and acts as an important indicator for measuring the performance of processor.It is useful to deal with least-squares problems by QR decomposition algorithm.Doing a research on QR decomposition algorithm is meaningful when exploiting the parallel processing performance of multi-core vector processors.Based on the characteristics of the vector system structure of Matrix,it is significant in both theory and application to analyse efficient design and implementation method of QR decomposition algorithm.This report made a deep analysis of the three methods of algorithm of vectorization of Q R decomposition algorithm. Aimed at the fusion of instruction optimization in the vector system structure of Matrix, three large-scale data mononuclear assemblers, namely Givens Rotation, Gram-schmidt orthogonalization and Householder transformation, were successfully designed and implemented. This report includes the following aspects:Design and implement the Givens rotating algorithm program, which is based on Matrix single-core. Make use of the vector shared registers to decrease the data transmission from DDR to SRAM; make use of software pipelining to optimize the manual assembly program and determine the minimum interval iteration; make a detailed analysis of the requirements on data configuration and offset the data initial storage to decrease AM_Bsy efficiently; adopt the double-buffering DMA strategy and calculate the data while transmitting to improve process performance. The results indicate that compared with the C language, which is based on the TMS320C6713 produced by the TI Company, the average speed of Givens with different scale and double precision is 74.33. In addition, the data performance of the 2048-order matrix increases to 74.77%.Design and implement the Gram-schmidt orthogonalization algorithm procedure, which is also based on Matrix single-core. Make the Gram-schmidt orthogonalization method more suitable for the Matrix vector structure system by improving the traditionsl one. Make use of software pipelining to optimize the manual assembly program and make a detailed analysis of the requirements on data configuration and the determination of the minimum interval iteration. Adopt the double-buffering DMA strategy and calculate the data while transmitting to improve process performance. The results indicate that compared with the C language, which is based on the TMS320C6713 produced by the TI Company, the average speed of Givens with different scale and double precision is 83.26. In addition, the data performance of the 2048-order matrix increases to 46.31%.Design and implement the Householder transformation algorithm program, which is also based on Matrix single-core. Make a detailed analysis of the basic principle and algorithm process of Householder transformation with large-scale data. By comparing two kinds of matrix multiplication, choose the one which is more suitable for the Matrix vector processer. Optimize the design of the single-core Householder transformation program which is based on the double-buffering DMA move calculation. Make use of software pipelining to optimize the manual assembly program and make a detailed analysis of the requirements on data configuration and the determination of the minimum interval iteration. The results indicate that compared with the C language, which is based on the TMS320C6713 produced by the TI Company, the average speed of Givens with different scale and double precision is 95.76. In addition, the data performance of the 1920-order matrix increases to 83.64%.
Keywords/Search Tags:QR decomposition, vectori zation, Givens Rotation, Gram-schmidt orthogonalization, Householder transformation, software pipelining and double-buffering DMA strategy
PDF Full Text Request
Related items