Font Size: a A A

The Research And Implementation Of CUDA-based Loop Mesh Subdivision Algorithm

Posted on:2011-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:H F LvFull Text:PDF
GTID:2178360308464117Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As an important part of Surface Modeling, Mesh Subdivision used to smooth an initial control mesh several times using some method, until up to the specified degree of smoothness. Loop Scheme is a typical representative of Mesh Subdivision, which been applied to building models as to its simple implementation, accurate calculating of Limit point and tangent plane.The shortcomings of the CPU-based Loop Subdivision are increasingly obvious as models are more and more complex. For a mesh consist of more than 40,000 vertexes and 80,000 triangular faces, need 7.69 hours to complete subdivision; but real model much more complex, so the efficiency is not unacceptable. In view of this, we propose CUDA-based Loop subdivision scheme on the GPU, to improve the subdivision efficiency.Compared to CPU, GPU has been designed for compute-intensive, highly parallel computing. CUDA just take full advantage of parallel capability of GPU, which is a kind of scalable parallel computing model launched by the NVIDIA. The execute model of single instruction, multiple threads (SIMT) of CUDA is very suitable for parallel to execute the same operations for large-scale data; So CUDA-based Loop subdivision can subdivision real-time for simple models; also can significantly accelerate subdivision for complex models.Based on CUDA programming model and Pinned memory storage model, we firstly implemented entirely GPU-based Loop scheme. For model consist of 19,012 triangular faces just need 1.571778 seconds; while CPU-based need 70.245313 seconds, efficiency promoted by 44 times. The core function would access violation when the three preparations that need Multi-Threads concurrent access the same memory in the GPU. So in optimization chapter, Firstly, storage face array in textures, get a higher efficiency, but does not solve conflicts. Then, move preparations to the CPU, solveed the problem, but reduce the efficiency. Last, use local information to build face neighbor, perfectly solve the problem and promote efficiency.Experimental results show that the optimized algorithm can significantly accelerated the subdivision process of complex models. The previous model that need 7.69 hours on the CPU, only need 58.23 seconds on the GPU, efficiency promoted by 475 times.
Keywords/Search Tags:Mesh Subdivision, Loop Scheme, CUDA, GPU, Multi-Threads
PDF Full Text Request
Related items