Font Size: a A A

Research On High Performance Parallel Computing And Architecture For Soft-Baseband Processing

Posted on:2015-03-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:T ChenFull Text:PDF
GTID:1108330509461082Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With rapid development of multimedia and wireless communication, much more service and higher transfer data rate are required on mobile device. At the same time, multiple wireless protocols coexist because of difference in region and evolution process. Nowadays, seamless linking between different wireless air interfaces is neces-sary to provide different services, which requires high performance and flexible base-band signal processing schemes. Traditional ASIC approach can achieve an optimal performance and power ratio, however, the long design time, weakly programmable and reconfigurable features make it difficult to incorporate multiple wireless stan-dards and for future evolution. Software baseband processor, which can implement different protocols on the same hardware platform by reprogramming, is considered as a promising approach for future wireless communication system. As the protocols evolving, the system is becoming more and more complicated, Baseband processor faces multiple challenges on throughput, flexible and power. Especially for mobile device, since the capacity of battery is limited, mobile baseband processor has much more stringent requirement of power consumption. Therefor, studying the technol-ogy of high-performance, low-power and flexible computer architecture for wireless communication has important practical and theoretical significance.Based on the analysis of various wireless protocols, this dissertation chooses the third generation (3G) standard WCDMA and the upcoming fourth generation (4G) standard 3GPP-LTE as the studying and implementing targets. This dissertation does researches on the four points:novel and high-efficiency architecture, applica-tion specific design methodology, instructions on reconfigurable parallel computing, multiprocessor architecture. some of the points are involved in one chapter, we also give the detailed evaluation and analysis of them. The major contributions and innovations can be summarized as follows.1 This dissertation proposes a scalable, parallel butterfly computation architecture with fixed shuffling mode. We decompose the FFT signal flow graphic (SFG) into different Epochs, each Epoch includes many independent butterfly compu-tation groups (BCP). Memory is accessed only at the beginning and ending of the BCP, and intermediary computating data in the BCP is stored in the local register file, which saves the power of memory accessing. In this architecture, a conflict-free memory access mechanism is proposed for parallel data access in the vector memory, eight radix-2 butterfly are calculated simultaneously. Besides, computational complexity is reduced by using constant multipliers in the BCP computation. Experiment results shows that the architecture can achieve the highest energy-efficiency of FFT computation than previous works, and the area is low too.2 This dissertation presents a high-throughput, parallel MIMO equalization archi-tecture. According to the analysis, MIMO equalization is performed tone by tone. We propose a 8-lane SIMD processor architecture and perform one tone MIMO equalization per lane, which minimizes the data communication between SIMD lanes. Besides, a reconfigurable register file is proposed for parallel data process-ing. Every register has two write ports and two read ports, but a 2 x 2 matrix in the register file can be accessed in row and column mode. Combined with vector processing unit supporting two complex number multiplication and addition, the system can deliver matrix inversion throughput up to 95 MInversion/s when exe-cuting MIMO equalization with 4T4R antennal size and 64QAM modulation. The throughput and area efficiency of the architecture is about two fold of previous works, the transfer data rate is up to 300Mbps, which satisfies the requirement of 3GPP-LTE standard.3 As for parallel computing of other baseband algorithms, this dissertation presents reconfigurable parallel data processing architecture USCA (Unified Soft-baseband Computer Architecture). USCA exploits both instruction parallelism and data parallelism by using VLIW (Very Large Instruction Word) and SIMD technology. It adopts hybrid scalar and vector processing unit which can run under three modes:solo scalar mode, solo vector mode and hybrid mode. Since different data types are required in different standards and algorithms, USCA can be configured to support byte, half-word and complex vector operations. Resource sharing is fully used to reduce the hardware area and improve the hardware efficiency. Ex-periment result shows that USCA can provide 130Gops and 323Mops/mW, and the energy efficiency is much higher than the previous works. And the vector pro-cessing units can be configured to implement different kinds of operations, which increases flexibility while maintains the high efficiency of architecture. It is more adept to the flexibility requirement in software defined radio.4 A multiprocessor architecture is proposed for the evolving standards and sys-tem. We present a high-efficient synchronization and communication mechanism based on distributed memory system. The synchronization mechanism is imple-mented by using shared scratch-pad-memory (SPM), every SPM is working with a semaphore mechanism which is used to guarantee the data consistency and simplify the read-write mode transformation. The SPM and semaphore mech-anism supports one-to-one, one-to-many and many-to-one synchronization. As for data exchange in multiprocessor system, a high speed, bidirectional inter-core communication mechanism, called CoDMA, is proposed. CoDMA does not need to allocate new memory spaces for exchanging data and can do it in their orig-inal memory place. Compared with traditional methods, CoDMA can achieve maximum 76% performance improvement and save 43% memory usage, while the area consumption of the CoDMA is only 0.59% of the system. At the same time, CoDMA reduces the number of memory access, and saves the system power too.
Keywords/Search Tags:Soft-Baseband Processor, Parallel Processing Architecture, Mul- tiprocessor, MIMO+OFDM system
PDF Full Text Request
Related items