| Driven by the rapid development of semiconductor process technology and computer architecture,chip multiprocessor(CMP)is now widely used in different computing fields.As the number of processors,hardware cores,and memories on a single CMP chip continuing to increase,scalable,low-latency,and high-bandwidth communication structures become critical,and traditional bus-based and point-to-point communication mechanisms can no longer meet the requirements.In order to effectively solve the interconnection efficiency of CMP,a communication-centric network-on-chip(No C)structure for data transmission emerges.No C provides a high-speed,high-throughput and high-expansion data exchange solution for CMP data exchange by means of routing exchange.At present,No C has become the main method of on-chip communication for multi-core processors.However,with the increase in the scale of system integration and data communication density,the power consumption of No C is getting higher,and high power consumption also brings a series of problems such as chip reliability.With the shrinking of semiconductor technology,static power consumption has become the main part of No C power consumption,and the proportion is constantly increasing.The buffer of No C routers plays an important role in flow control and quality of service,but is also an important part of the static power consumption of the on-chip network,and occupies a major part of the chip area.Area and power consumption limit the development of future No C,and low-overhead buffer design plays a crucial role in building high-performance and energy-efficient No C.Compared to the traditional Static Random Access Memory(SRAM),Non-volatile Memory(NVM)has great advantages in terms of area and leakage power.Therefore,this research mainly studies how to reduce the router buffer power consumption overhead based on NVM.The main research contents and results of this paper are summarized as follows:1.A Congestion-aware buffer design for network-on-chip router is proposed.Buffer is an important component for No C routers to achieve high-performance data processing and forwarding.When the network traffic load exceeds the buffer capacity,congestion frequently occurs,which will seriously affect network latency and throughput.The virtual channel buffer structure encounters problems of head-of-line blocking,and the virtual output queue buffer suffers from unbalanced buffer utilization.On the other hand,the static buffer allocation mechanism is prone to low buffer utilization,while the dynamic buffer allocation mechanism exists a risk of deadlock and queue starvation.Reasonable buffer structure and efficient buffer management mechanism are the keys to prevent unbalanced allocation of resources and low utilization of buffer.This paper proposes a Congestion-aware Buffer with Mixed Queues(CBMQ),a buffer structure that integrates virtual channels and virtual output queues,and adopts a buffer management mechanism combining reserved buffers and shared buffers.CBMQ can reduce network latency by increasing the depth of virtual channels when the network load is low,and use shared buffers to isolate congested traffic when congestion occurs to improve network throughput in different network states.Performance tests show that CBMQ can more effectively slow down the formation and propagation of congestion,improve buffer utilization under different traffic load conditions,and achieve lower latency and higher saturation throughput than static buffer allocation structures and dynamic buffer allocation structures by more than 11.2% and 3.3%.2.A hybrid SRAM and STT-RAM buffer design for network-on-chip router is proposed.Traditional SRAM-based No C router buffer is increasingly unable to meet the requirements of No C power consumption owing to SRAM’s high static power consumption.In this paper,a congestion-aware buffer structure(Hybrid Memory Buffer with Mixed Queue,HMMQ)based on SRAM and Spin-Torque Transfer RAM(STT-RAM)is proposed to reduce No C router power consumption and relieve congestion.HMMQ adopts the CBMQ buffer structure.Through congestion detection and congestion management,the congested traffic is isolated from the non-congested traffic to release the bandwidth of the congested traffic.The reserved buffer in CBMQ adopts SRAM to store and forward data efficiently,while the shared buffer uses STT-RAM with high integration and extremely low leakage power consumption to store congested data.The dual-bank structure in HMMQ overcomes the disadvantage of high latency of STT-RAM write operations,and logically implements read and write operations with the same sequence as SRAM.The simulation test results show that HMMQ achieves an average improvement of 14.2% and 36% in saturation throughput and power over SRAM-based designs,respectively,while only increasing the area by 9.1%.3.Area and energy-efficient buffer designs for No C based on Domain-Wall Memory are proposed.Domain Wall Memory(DWM)is an emerging spin-based non-volatile memory technology with higher integration and better energy efficiency than commonly used SRAM and STT-RAM,but its inherent shift characteristic brings in indefinite shift distance and delay.This paper first analyzes the influence of DWM device structure on its area,power consumption and performance,and then analyzes the performance of buffer queue’s structure based on different DWM device structures,and designs different DWM buffer based on the No C router operation pipeline and the structure characteristics of DWM devices.The simulation test results show that,compared to the traditional SRAM or STTRAM-based buffer design,the multi-read-port DWM based design can gain 50.1% or 30%power reduction,and 36.1% or 24.2% area reduction.The single-read-port DWM buffer designs suffer from considerable latency increases and saturation throughput degradation due to shifting issues,but multi-read-port DWM design achieve the same performance as SRAM-based buffer design and best energy efficiency. |