Font Size: a A A

Studies On Parallel Implementation And Algorithm Optimization Of Numerical Method For Inclusion Problems

Posted on:2020-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:D H LuoFull Text:PDF
GTID:2370330599952770Subject:engineering
Abstract/Summary:PDF Full Text Request
On the scale of microns,machine components inevitably involve material heterogeneity.Analyses of such problems using the Eshelby inclusion model are usually difficult,since impurities and imperfections of materials are not only of arbitrary shape but also distribute in a random way.In particular,analytical studies on inclusions of irregular shape would often encounter intractable complexities.In order to handle these arbitrarily shaped inclusions,the computational domain is commonly discretized into a huge number of cuboidal elements,the resultant solutions are therefore obtained by summing the contributions from each element.When irregular inclusions with a large degree of rugosity are considered in the computational domain,a finer discretization may usually yield better accuracy.On the other hand,however,a finer mesh may lead to extremely expensive computations.During the past decade,numerical techniques featuring the discrete convolution/correlation and fast Fourier transform(DC-FFT or DCR-FFT)have been developed to speed up the computation.In the recent years,rapid development of higher performance computing has facilitated the analysis of the largescale data.In order to take advantage of these achievements,further refinement of the algorithms for numerical studies of inclusion problem may be well worth exploring.In a typical numerical algorithm,if the program contains multiple nested loops and separated task branches,a parallel computational strategy may be considered besides pure algorithmic enhancement.With the development of science and technology,the number of cores in Central Processing Unit(CPU)has increased significantly,and a similar improvement has been found in the computational power of Graphics Processing Unit(GPU).Also,there are more and more open royalty-free standards for parallel programming.All the above hardware and software conditions may greatly facilitate parallel computing.The current work is concerned with the refined numerical algorithms for solving arbitrarily shaped three-dimensional(3D)inclusions.The present algorithms have been polished for adapting the parallel computing,while the corresponding algorithm structures are optimized in several aspects.These improvements of the current contribution are able to improve the computational efficiency tremendously.First,the effects of the different transform types and control parameters on the computational efficiency are studied.In contrast to the previous studies that are concerned with the complex Fourier transforms on a real array,the current work is conducted on the real transforms including in-place and out-of-place types.Moreover,the complex Fourier transforms on two real arrays are also studied.The memory usage and time consumption for all kinds of transform types are compared and charted.Furthermore,the computational performance is benchmarked for different usage number and array sizes in an fftw_plan.Next,the implementation of parallel computing for inclusion model is conducted.Computational time and memory usage have been tested under four parallel schemes with CPU.The comparison shows that all of the schemes can improve the efficiency of the numerical algorithm.However,two algorithms parallelling the rows or columns of the discrete convolution/correlation matrix significantly lack a balanced CPU core usage.The other two do not show such efficiency waste,but exhibit different scales of speedups and memory usage,especially when the number of threads is larger.Subsequently,the parallel scheme with GPU is implemented by using the OpenACC,showing a doubled performance improvement over that with CPU.Moreover,the structure of numerical algorithm has been elaborated.After analyzing the computational repeatability,the de-duplication optimization has been performed,and the corresponding efficiency is increased by a factor of four.Both the CPU and GPU parallel schemes have been tested for further improvement of the efficiency.Amazingly,a nearly sixty times overall speedup is evidenced for the case of half space solved by GPU paralleling.The singularity issue(numerical instability)which may occur when the two meshes are set independently was remedied through self-adaptive scheme.Then,the numerical algorithm can re-set the size of the target mesh in certain special cases.For example,when the target mesh is degraded into a strip or a planar mesh,the computational time is reduced to one quarter or one half of the original one,respectively.For the full space problem,the matrix of influent coefficients(ICs)is symmetric.Accordingly,two methods are applied to save half of the symmetric information,for an optimal number of calculation operations.The comparisons turn out that both methods can increase the computational efficiency significantly.Because the key tasks in constructing the ICs matrix is to evaluate the primitive functions,which depend only on the shapes and relative distance between the target and field meshes,it is useful to save the primitive functions for subsequent use even in the case that the source domain may contain different arrangement of inclusions.
Keywords/Search Tags:Inclusion Problem, Numerical Algorithm, Parallel Computating, Algorithm Optimization
PDF Full Text Request
Related items