Font Size: a A A

Design And Implementation Of Multi-core DSP Synchronization Mechanism

Posted on:2021-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:H L WangFull Text:PDF
GTID:2518306050468574Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
To explore the parallelism of parallel applications and to seek higher performance in many-core DSP,an efficient inter-DSP synchronization mechanism is needed.In the traditional semaphore synchronization mechanism,the spin lock based on the"busy-waiting"mechanism guarantees mutual exclusion by continuously requesting synchronization variables,which will cause serious communication delay between processor nodes,and then cause large network traffic which leads to severe network competition.The characteristic of the barrier synchronization mechanism is global,which is reflected in the fact that it requires the participation of multiple processor cores.However,the global barrier synchronization mechanism easily leads to serious serialization,which has a bad effect on the performance of the system.Therefore,how to provide efficient synchronization mechanism in order to fully exploit the parallel performance of many-core processors has become an important topic in the field of many-core architecture design.In this design,a many-core DSP hierarchical barrier synchronization scheme is proposed,and hardware design and implementationc is carried out.The research work of this paper is as follows:Firstly,the development trend and challenges of the current processor are analyzed,and several synchronization mechanisms are introduced.Combined with the architecture characteristics of the X DSP,the hierarchical barrier synchronization scheme is designed,including the barrier synchronization unit within the super node and the barrier synchronization unit between the super nodes,so as to realize the fast synchronization between the multiple cores of the X DSP.Secondly,the realization of the hierarchical fence synchronization scheme is completed,that is,the design of the fence synchronization unit within the super node and the fence synchronization unit between the super nodes.The barrier synchronization unit in the supernode is mainly responsible for completing the fast synchronization of 2-4 DSP cores in the supernode.According to the barrier number,the synchronization requests of the 4DSPs are distributed to different barrier for processing,and the barrier requests are synchronized,and DSP pause signals and abnormal signals are generated;the inter-node barrier synchronization unit is mainly responsible for the fast synchronization between 6super nodes,including address decoding,logical judgment and synchronization variable operation center three modules,of which The address decoding module decoded according to the address and generate the corresponding barrier operation type and synchronization operation data;The logical judgment is mainly to determine whether the barrier operation is effective and sent it to the synchronous variable operation center;the synchronous variable operation center is the control register of the synchronization unit or the barrier instance register access to generate the final barrier release signal.Based on this,the RTL code of the layered barrier synchronization unit is completed.Then,the verification function points are extracted for each unit of the design,and the Verilog test is used to stimulate the completion of the module-level verification of the internal function of the synchronization unit between the supernodes.Because the synchronization unit module in the supernode and the external module have interactive signals,the system level verification of the module is carried out to ensure the correctness and integrity of the design.Finally,the performance of the synchronization unit is evaluated.Through comprehensive incentives,the effectiveness and performance of many-core DSP synchronization technology based on the hierarchical barrier structure are qualitatively and quantitatively analyzed,verified and evaluated by comprehensive excitation.Under the 40nm process condition of a certain manufacturer,the working condition is set to Worst,the input delay is 0.05ns,the output delay is 0.1ns,and the clock constraint is set to 0.35ns,the integrated area of the barrier synchronization unit in the supernode is approximately 4487.62 um~2,the power consumption is about 3.12mw;the area of the fence synchronization unit between super nodes is about 2556.62 um~2,and the power consumption is about 1.42mw.
Keywords/Search Tags:Many-core processor, Super-Node, layering, Barrier Synchronization
PDF Full Text Request
Related items