Font Size: a A A

Parallel Global Illumination Algorithm For High Performance Computer

Posted on:2022-03-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:X XuFull Text:PDF
GTID:1488306608477294Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Realistic rendering is one of the key technologies in digital creative industries such as film and animation,virtual reality,games,and simulation.As the core part of realistic rendering,global illumination algorithm generates realistic pictures by simulating ray transmission in the real world.With the increasing pursuit of rendering quality,global illumination requires more accurate simulation of more complex geometries.It is undoubtedly a massive challenge to the computer's calculation and storage capabilities.An effective solution to the problem of slow global illumination computation is to parallelize the global illumination algorithm.The parallelized global illumination algorithm takes advantage of the parallel computing capability of hardware to improve the rendering speed by dividing the computation task into multiple subtasks and executing them in parallel.For the problem that single machine memory cannot handle large-scale scenes,it can be solved by distributed parallelism.However,because in global illumination,when the ray propagates in the scene,especially after multiple bounces,it is possible to generate intersections with any location in the whole scene,which brings frequent data transmission and reduces data utilization for large-scale scenes that cannot be stored by a single node with global discrete access storage.Therefore,how to achieve a balanced task division while ensuring data utilization is the difficulty of the distributed global light parallel drawing algorithm.For such problems,the division of drawing tasks,the organization of data,the design of scheduling algorithms,and the system architecture of the computer are closely related.High performance computers with high computing power and large storage capacity have been used for weather prediction,molecular modeling,physics simulation,cryptanalysis and other fields.China's high-performance computers have gone through the process from imitation to surpassing,and are now at the forefront of the world in terms of computing power.Commercial rendering software is difficult to deploy on domestic high-performance computers,which restricts the development of China's digital content creative industry.Therefore,there is an urgent need to study the highly realistic rendering algorithms applicable to the architecture of domestic highperformance computers,which can guarantee the rendering quality and realize super large-scale fast rendering at the same time,which is important for expanding the application of domestic high-performance computers.This paper addresses the speed and storage problems in global illumination mapping,and proposes parallel mapping algorithms under high performance computer architecture:1)a vectorized point-based global illumination algorithm is proposed to improve the speed of global illumination computation in a single node of high performance computer;2)an asynchronous distributed ray tracing algorithm based on temporal coherence is proposed to realize large scale scenes of TB or more;3)a loadbalanced distributed photon mapping algorithm to achieve efficient photon collection of large-scale photons and improve the caustics effect.In summary,the contributions and innovations of this paper are as follows.1)Point based global illumination(PBGI)algorithm can quickly generate global illumination effects without noise.In order to improve the operation speed of the pointbased global illumination algorithm,this paper proposes a vectorized point-based global illumination method to improve the global illumination calculation speed of a single node of a high-performance computer.The algorithm includes two parts:point cloud tree traversal vectorization and micro-buffer projection vectorization.For the point cloud tree traversal part,this paper proposes three different vectorization traversal algorithms:Packet,Single and Hybrid for shading points with different spatial coherence;for the micro-buffer projection part,this paper proposes different vectorization algorithms for near,medium and far distance according to the distance between the point cloud tree nodes and the shading points.For the micro-buffer projection part,different vectorized projection methods are proposed for three different point cloud tree nodes:near,medium and far,according to the distance between nodes and shading points.The experimental results show that compared with the nonvectorized PBGI algorithm,this algorithm can achieve a speedup of 7?9 times in the point cloud tree traversal phase,a speedup of 2 times in the micro-buffer projection part,and a speedup of 5?7 times in the overall algorithm.2)Distributed ray tracing algorithm is a common method to render large-scale scene,which reduces the storage pressure of a single computing node by dividing the scene into multiple scene blocks and storing them in a distributed manner.However,it is difficult to achieve load balancing among nodes and reduce the communication between nodes.This paper proposes an asynchronous distributed ray-tracing algorithm based on temporal coherence for large-scale scenes(terabytes)rendering.The algorithm mainly exploits the similarity of ray propagation between consecutive frames by recording the information of ray transmission in the previous frame and using it for scene chunk preallocation for the next frame and scene chunk scheduling in each node at runtime.The algorithm performs data transmission,task scheduling and rendering tasks within each node in parallel,and hides the data scheduling overhead through an asynchronous scheduling method.Moreover,we estimate the radiance of the current ray based on the previous frame,and send the rays with low contribution to the precomputed simplified model for further tracing,reducing the traversal complexity and the overhead of network data transmission.Experimental results show that the present algorithm is able to achieve up to 75%speedup compared to existing asynchronous distributed algorithms,and the difference between the generated image and the original image(MSE)is less than e-4.3)Photon mapping algorithm is suitable for computing global illumination effects such as caustics.It needs to emit a large amount of photon data in order to generate high quality caustics effects.However,these photon data cannot be collected in a single machine memory.In this paper,we propose a distributed photon mapping algorithm with tight coupling of data and tasks,which realizes the load balancing of data and tasks in each distributed node and can support the efficient collection of large-scale photons of more than 100 million and improve the caustics effect.Among them,in the data organization stage,the parallel distributed photon tree construction based on Morton code and the coloring point tree construction algorithm are proposed;in the task scheduling stage,an effective photon tree and shading point tree retrieval algorithm is proposed to improve the correlation between shading point tasks and photon data and achieve the load balancing among each distributed node.Experimental results show that compared with existing distributed photon mapping algorithms,the algorithm in this paper,with 256 nodes,still maintains high parallel drawing efficiency without any quality loss.The algorithm of this paper has been integrated into Shandong University's selfdeveloped rendering system "RWing",and has been deployed on domestic highperformance computers such as "TaihuLight","Tianhe ?" and "Kunlun",to provide rendering services for the film and animation industry.In the next step,in order to achieve highly portable parallel rendering that can run on different high-performance computer architectures,single-computer heterogeneous parallel rendering algorithms that can be applied to different hardware architectures and parallel rendering algorithms in heterogeneous distributed architectures will be explored.
Keywords/Search Tags:Rendering, Parallel computing, Supercomputer, Distributed rendering
PDF Full Text Request
Related items