Font Size: a A A

Research On Optimization Of Trilinear Decomposition Algorithm In Embedded Environment

Posted on:2013-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:K FengFull Text:PDF
GTID:2248330395984847Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Because of the capability of simultaneous analysis of complexly composited substance, trilinear decomposition algorithm is widely used in a variety of fields. When it comes to the embedded application promotion stage, it is the low hardware resource usage and unsatisfied performance that becomes kernel problem. The trilinear decomposition algorithm is a complicated procedure and the operations are mainly on matrices. So it’s becoming an urgent problem to figure out the optimization policy of the algorithm on embedded platforms and improve the performance.The instruction scheduler and TLB replacement policy are simplified in nowadays embedded platform compared to desktop platforms. In addition, conventional optimizations on embedded platforms are carried out in a resource constrained situation, while currently there are much more resources in nowadays embedded platforms. However, similar works on those platforms are rarely done. In order to improve the performance of trilinear decomposition algorithm on nowadays embedded system, optimizations are carried out specifically to shorten the execution time of the algorithm. The specific works are as follow:After profiling the trilinear decomposition algorithm and researching the architectural characteristics of the platforms, the matrix multiplication is decided to the main work of the overall optimization. And the maximum speedup rate is calculated to assess the optimization work.On account of characteristics of ARMv7architecture, especially the differences in instruction scheduler and TLB replacement policy between ARMv7and desktop architecture, the blocking algorithm of matrix multiplication in GotoBLAS is optimized to improve the basic performance.Based on the previous step, the matrix multiplication kernel is optimized for the vector calculation feature in NEON. The memory access in partly copy of blocked matrix multiplication is accelerated by making use of the advantage in memory bandwidth of NEON. So the matrix multiplication is optimized in both arithmetical calculation and memory access.In order to verify the effectiveness of the optimization work, the optimized matrix multiplication is implemented. Performance of optimized matrix multiplication is assessed on a variety of ARMv7based platforms. After that, the overall speedup rate is tested by replacing the traditional matrix multiplication in trilinear decomposition algorithm by the optimized one. The experiments show that the optimization is better than other open source libraries. It can bring about7to30times of speedup rate. The performance of trilinear decomposition is improved about2.8times after the optimization.
Keywords/Search Tags:Optimization on embedded platform, Trilinear decomposition algorithm, Matrix multiplication, ARMv7
PDF Full Text Request
Related items