Many research applications in fields such as Abstract Algebra and Number Theory require computing the exact product of integers whose size greatly exceeds the capability of a single machine multiply instruction. We require efficient parallel algorithms to multiply these integers. We present a non-recursive FFT-based ({dollar}omega{dollar} = {dollar}esp{lcub}(sqrt{lcub}-1{rcub})2pi/n{rcub}){dollar} parallel multiplication algorithm for which we give a time complexity bounded by O(log n log log n), and parallel versions of the multiplication technique commonly taught to children and the classic Schonhage-Strassen ring-of-integers FFT-based algorithm for which we give time complexities bounded by O(log n). The algorithms are presented under the assumption of an exclusive read, exclusive write global shared memory model. Detailed analyses of these algorithms are performed in three models--two global shared memory models with different data widths and one interconnection network model using a barrel shifter. Performance under varying model parameters is computed, discussed, and compared within and between models. The results indicate a general superiority, in terms of (time {dollar}times{dollar} processors), of the Schonhage-Strassen algorithm. |