| With the development of chip design and manufacturing technology,many-core processor technology has developed rapidly,and more and more processor computing cores are integrated on a single chip,ranging from a dozen cores to hundreds or even thousands of cores today.In order to give full play to the performance of many-core processors,the synchronization mechanism for coordinating the parallel work of processor cores is particularly important.The barrier synchronization mechanism is a synchronization method often used by many-core processors,and its efficiency affects the actual performance of the processor.Frequent use of the software barrier method will result in higher communication delay and more network traffic between processor cores,forming hot spots in the network and reducing the performance of the processor.Therefore,an efficient barrier synchronization mechanism is very important to improve the parallel efficiency of many-core processors.Starting from the software barrier scheme,this paper implements three hardware barrier synchronization methods in view of the excessive number of synchronization packets generated in the network,the high synchronization delay(up to thousands of clock cycles)and the serialization problem.The description of the method is as follows:Centralized hardware barrier Mechanism: The main idea of this method is to integrate a barrier synchronization processing unit in the processor to handle all synchronization requests.Since only one synchronization device is set,all synchronization requests will be sent here,which may cause network congestion and network hotspots when barrier synchronization is performed frequently or at the same time.Hierarchical hardware barrier Mechanism: According to the problem of centralized hardware barrier when the number of cores increases,this paper proposes a hierarchical hardware barrier suitable for large-scale multi-core or many-core processors.This method utilizes the characteristics of different communication costs inside and outside the supernode,and sets up local barrier synchronization units and global barrier synchronization units inside and outside the supernodes.The core synchronization in the super node can be quickly completed within the super node,and the synchronization between the super nodes can reduce the traffic in the network after each super node merges the synchronization packets.Mesh-based distributed hardware barrier Mechanism: When many-core processors integrate hundreds or even thousands of cores,the solution of hierarchical barrier shows the disadvantage of centralized,so this paper proposes a distributed hardware barrier synchronization scheme.This method is suitable for many-core processors based on mesh networks.The main design idea of this method is to set a barrier synchronization unit in each core,and each barrier synchronization can select the nearest synchronization unit for synchronization,which reduces the average synchronization delay.Finally,the hardware design and implementation of the three hardware barrier schemes are completed using the hardware description language,and the function points are verified according to the characteristics of each method to ensure the functional correctness and integrity of the design.And the design was tested using the synthetic traffic model,and the specific performance data was obtained.Then,the data of the three hardware barrier schemes were compared and analyzed,and the advantages and disadvantages of the three kinds of barriers were analyzed from the perspectives of network traffic,delay and hardware overhead.The results show that: when the number of processor cores is not more than eight cores,this paper recommends a centralized design scheme;when the number of cores increases,but not more than forty cores,this paper recommends using a hierarchical hardware barrier to achieve better performance,network traffic and latency are lower;when the number of cores is larger,it is recommended to use mesh-based distributed hardware barriers to obtain lower average latency. |