Font Size: a A A

Design And Implementation Of Singular Value Decomposition Acceleration Scheme

Posted on:2018-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y F MaFull Text:PDF
GTID:2348330512976872Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Singular value decomposition(SVD)is an essential component of numerical computation,which plays an important role in many disciplines,such as large-scale MIMO in wireless communication,feature extraction and PCA in image processing,data compression and semantic indexing in machine learning and correlation analysis in big data.The singular value decomposition algorithm is a matrix decomposition algorithm with relatively high computational complexity.As the scale of data processing continues to increase,there is an increasing high demand of computing speed for singular value decomposition whether in the research and application of wireless communication or of image processing and data machine with larger matrix dimension and scale.So that it would be a great value in the research and application of implementation of the acceleration scheme of matrix singular value decomposition.This thesis mainly focuses on the one-sided Jacobi algorithm.This algorithm has the characteristics of high relative precision and high speed,which is very suitable for parallelization and large-scale matrix decomposition.For the Jacobi algorithm,the rotation transformation and the sequence ordering are decisive for the speed of the decomposition.Different matrix column-pairs indexing methods are studied in this paper,and two different sequence index methods of cyclic ordering and ring ordering are applied to the hardware design.The ring ordering index method can not only be beneficial to the parallelization,but also can get the ordered singular values,and also has a positive effect on the convergence speed of the algorithm.To meet the real-time and low-latency requirements,this thesis proposes the architecture of the one-sided Jacobi transformation algorithm based on the on-chip storage presents an obviously acceleration and high numerical precision compared to the MATLAB and GPU platform with the same algorithm.On this basis,this thesis also presents a parallel hardware architecture based on ring-ordering sequence and on-chip memory.Compared with the cyclic sequence method,the measured acceleration ratio is 2.95 times.In order to solve the problems of restriction by the capacity of the on-chip storage and the complexity of the hardware scheme in image processing and machine learning applications,a parallel hardware acceleration scheme based on the off-chip storage and ring-ordering sequence is designed.And based on the relationship between time-consuming and resources consumption of simulation results,this thesis proposes a balance strategy of performance and resource.
Keywords/Search Tags:Singular value decomposition, FPGA, Jacobi algorithm, Rotation transformation, Heterogeneous computation
PDF Full Text Request
Related items