Font Size: a A A

Research Of Parallel 3D-FFT On Multi-core Processor

Posted on:2018-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q Z ZengFull Text:PDF
GTID:2348330533469805Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the increase in the amount of computing tasks,single-core processor has been unable to meet user needs,the main reason is that power consumption issues limit the way single-core processors continue to improve performance.The advent of multi-core processors,provide a solution for multi-tasking,large data problems.3D-FFT is one of the most important algorithms in science and engineering research,and plays a vital role in computer vision and pattern recognition,video telephony and nuclear magnetic resonance imaging algorithms.How to do 3D-FFT computing more quickly on multi-core processors is the primary challenge in scientific research.In this paper,the 3D-FFT algorithm is studied from three aspects: data preprocessing,data calculation and data transposition.In the data preprocessing phase,in order to avoid some nuclear full load,and some nuclear idling phenomenon,the use of load balancing algorithm for each core average processing tasks.At the same time,in order to prevent the data in the calculation phase of the data between the core unnecessary migration,the thread and CPU affinity algorithm,so that the data in the designated CPU core.In order to solve the pseudo-sharing problem of shared memory multi-core processor,the cache line padding algorithm is proposed,so that the data belonging to different core is divided into independent cache lines.We test algorithm cache hit rate in the 2 cores and 8 cores processor respectively.The experiment shows that the cache line padding and thread CPU affinity algorithm,in the 2 cores processor,L1 data cache hit rate and L3 cache hit rate increased by 0.2% and 0.2 %,8 cores processor were increased by 0.2% and 0.1% respectively.In the data calculation phase,based on multi-column FFT algorithm,uses multithreading to parallel processing.For FFT each calculation,the use of classical six-step FFT algorithm to divide the data into smaller data size to speed up processing.The bit reversal process,using the optimizational inversion strategy,once processed four data points,so that the time complexity of the algorithm is half of the original.In the data transposition phase,a global transpose data communication optimization algorithm is proposed to reduce the total number of data points involved in inter-core communication,thus speeding up the global transpose speed.Compared with the conventional global transpose algorithm,the communication time is reduced by an average of 0.05 seconds,about 9.5% increase.Uses the multi-core simulator Sniper Simulator,to calculate its power consumption.Under the8 cores processor,the communication optimization algorithm and the conventional communication algorithm,the inter-core power consumption,the core and memory power consumption and the first-order data cache power consumption are 20 W and22W,7.2W and 7.5W and 16.9W and 17.1W respectively.In order to further reflect the superiority of the algorithm performance,compared with the FFTW and Open MP multithreading.In the 8 core,compared with the FFTW and Open MP cache miss rate,communication optimization 3D-FFT algorithm cache miss rate is almost 0.Compared with the library function of FFTW computational three-dimensional discrete Fourier transform,the performance of communication-optimized 3D-FFT algorithm is 1.59 times under 2 cores and 4.93 times under 8 cores.If the FFTW library function uses Open MP multithreading to speed up,the performance of communication-optimized 3D-FFT algorithm in 2 cores and 8 cores,compared with Open MP multithreading,is 1.07 times and 1.48 times respectively.
Keywords/Search Tags:Multi-core Processor, 3D-FFT, Cache Miss Rate, Algorithm Performance
PDF Full Text Request
Related items