Font Size: a A A

Efficient Algorithms And Low-latency Implementations Of Polar Codes For 5G

Posted on:2019-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ShenFull Text:PDF
GTID:2428330590960047Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Channel coding is an essential part of communication system.Since C.Shannon proposed Information Theory,numerous scholars have explored the channel coding which could achieve the Shannon capacity.In 2008,polar codes were first proposed by E.Ar?kan which are the first provably capacity-achieving codes for binary-input discrete memoryless channels?B-DSMCs?.In 2016,polar codes were elected as the standard for 5G enhance mobile broadband?eMBB?control channels.Po-lar codes show the great potentials in wireless communications and storage systems.They possess promising error-correcting performance and low complexities of encoding and decoding.Currently,the encoding chain utilizes puncturing or shortening to match the transmitting rate and the decoding algorithm is successive cancellation list?SCL?decoding in 5G standard.Under puncturing or short-ening mode,several bits in codewords are unnecessary to be transmitted,which could support any code lengths and code rates.Researches on related polar construction methods and polar encoding implementations are valuable.With the help of outer cyclic redundancy check?CRC?codes,polar codes with SCL decoding could outperform low-density parity-check?LDPC?codes and Turbo codes in terms of error-correcting performance.However,it suffers from high decoding latency.When the list size equals to L,SCL is L times the complexity of successive cancellation?SC?decoding,and the list communication consumes more extra time.Therefore,it is meaningful to study the high-efficiency decoding algorithm and low-latency implementation of polar code.In this paper,software implementation based on general purpose processor is studied.Compared with hardware implementation,software implementation has more flexibility and portability.So it has lower cost and shorter cycle to be developed and updated.Furthermore,it conforms to cloud-radio access networks?C-RAN?in 5G system,which could allocate resource dynamically,improve the network utilization and meet the future communication demand.Aiming at the polar encoding architecture,the general polar construction method which is suitable to any binary-input symmetric channels is proposed.Based on this,a series of low-latency encoding architectures are designed.Aiming at the high-efficiency decoding algorithm of polar codes,three novel algorithms are proposed,including the general segmented CRC-aided list pruning algorithm,quick distributed sorting algorithm,and stage-located copy algorithm.Moreover,the CRC distribution method is optimized further to meet the generality and the traditional distributed sorting algorithm is adjusted to reduce the complexity.Aiming at the low-latency implementation,the software polar encoder and decoder based on general purpose processor are designed and imple-mented.Furthermore,the multi-core and multithreading programming for software polar decoder is also implemented.Based on the traditional Tal-Vardy construction method,this paper proposes a general construc-tion algorithm for polar codes when the coordinated channels are independent but not necessarily identical channels.Rigorous theoretical derivation for the symmetry and degradation is provided for the proposed general construction.Numerical results over binary erasure channels,binary symmetric channels and additive white Gaussian noise channels show the better performance under puncturing or shortening mode.According to the property of polarization and rate matching,the low-latency polar encoding architectures are proposed,which could reduce the encoding latency by decreasing the number of input bits.The equation for auto-generator is summarized and the auto-generator based low-latency folded architecture is designed.The corresponding hardware implementation show 31%reduction of latency and 28%improvement of throughput for 768-bit length and 1/2-rate polar codes.In addition,other four architectures are proposed,which are forward-input,backward-input,shortening mode and puncturing mode architectures.In the case of 16-bit length,the folding methods are introduced and the corresponding circuits are given.The four architectures could reduce the latency and the delay elements are not larger than the traditional folded architecture.The general segmented CRC-aided list pruning algorithm combines segmented decoding with list pruning.In this paper,the segmentation method based on channel polarization is first proposed.According to the polarized property,the list size for continuous distributed information block could be declined.Corresponding segmentation method and parameter configuration are discussed.For the division of CRC,the general method to allocate CRC is proposed.Based on the Gaussian approximation method,the equivalent coderate and equivalent signal-to-noise ratio?SNR?of each segment could be calculated,then the error probability of each segment could be obtained.By the joint optimization on all segments,the best allocation is produced as the parameter setting of segmented decoding algorithm.This algorithm has the flexibility and generality,which is more suitable to software implementation.The configuration table is run off-line,and the algorithm could reduce the online decoding latency without performance loss.For the list sorting of SCL decoding,the distributed sorting is adjusted and the quick distributed sorting is proposed.As a kind of relaxed sorting,distributed sorting could reduce the sorting complexity effectively utilizing the relationship between the two expanded lists.In this paper,the steps of distributed sorting are adjusted based on the polarized property,where the complexity is positively related to the round of comparisons.Therefore,the adjusted distributed sorting shows lower latency implemented on software.For special nodes in decoding,the distributed sorting is not suitable,since the lack of relationship among the expanded lists.In this paper,the quick distributed sorting is proposed to reduce the sorting complexity.Meanwhile,it keeps most lists unchanged to reduce the times of list updating.For the list updating of SCL decoding,the stage-located copy algorithm is proposed.To set up a reference matrix,this algorithm could record the index of first different bits for each list pair.Furthermore,the stages with different data could be located by upper bound and lower bound.It could reduce the copy latency by only copying the stage with different data,which avoids the redundant copy operation.Compared with the current index copy,stage-located copy is more suitable to software implementation,since the read-write of index matrix consumes more extra time.Under the same test environment,the software decoder with stage-located copy is 30%faster than the software decoder with index copy.Based on the above high-efficiency algorithms,the software polar encoder and decoder platform based on the general purpose processor is designed and implemented.Expect for the innovative algorithms,there are other optimizations to accelerate the platform.In the algorithm level,the optimization for special nodes is employed.In the data level,the single instruction and multiple data?SIMD?technique is used to do parallel computing for intra-frame data.Moreover,there are also some optimization in compiler level and code level.The implementation results show the latency of 325?s for?2048,1723?polar codes with L=32 and SNR equaling to 4.0 dB.Compared with the state-of-the-art works after CPU normalization,our platform is 37%faster than the state-of-the-art flexible software decoder and 10%faster than the state-of-the-art unrolled software decoder.Implementation result for long-codelength scenario show the well balance between decoding latency and error-correcting performance.For?8192,6144?polar codes with L=16,the decoding latency achieves 759?s under the frame error rate requirement of 10-7,which shows the great potentials for the application in scenarios with long code lengths.To further enhance the decoding throughput,the adaptive SC-SCL hybrid decoding algorithm is employed.The latency for the worst case is determined by SCL decoding and the throughput mainly depends on SC decoding.In this paper,the multi-core and multithreading programming is used to compute inter-frame data in parallel.The multi-core and multithreading programmings based on both OpenMP and the new tools in C++11 are implemented.The results for 4-thread show that the decoding throughput on 4-core CPU is 3 times as fast as that on single core.The results based on new tools in C++11 are more stable than that based on OpenMP.Implemented on the hybrid decoder,for?2048,1723?polar codes with L=32 and SNR equaling to 4.5 dB,the decoding throughput could reach at 333.87 Mbps which is 40%fast than the state-of-the-art software hybrid polar decoder.
Keywords/Search Tags:5G, channel coding, polar codes, general purpose processor, low-latency, rate matching, folding, segmented decoding, Gaussian approximation, distributed sorting, stage-located copy, single instruction multiple data, parallel computing
PDF Full Text Request
Related items