Font Size: a A A

Research Of Manycore And On-Chip Memory Integration Based On 2.5D Technology

Posted on:2019-01-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:D J XuFull Text:PDF
GTID:1488306512455194Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
Highly-integrated many-core processors have been widely used to meet the performance requirements of data throughput and transmission energy efficiency in modern data processing centers.Due to the restrictions on the number of ports and transmission loss,the PCB-based 2D interconnection and integration between many-core processors and memories can no longer meet the requirements for communication bandwidth and transmission energy consumption.3D integration using through-silicon via(TSV)technology and 2.5D integration using through-silicon interposer(TSI)technology provide better solutions for many-core processor and memory integration.The performance of many-core integrated systems is closely related to structural thermal reliability,energy efficiency and low bit-error-rate(BER)of I/O transmission,and communication efficiency with high bandwidth.It is critical for improving system performance to investigate the heat distribution,I/O transmission energy efficiency and communication efficiency of many-core systems under different integrated architectures.We have studied on improving the data processing efficiency and power consumption of 2.5D many-core integrated systems with the following works done.1.Based on the structural characteristics of many-core and memory integrated system,the functional simulation and power analysis model of 3D and 2.5D many-core and memory integrated systems are constructed to analyze the relationship between system performance and thermal distribution.The benchmarks have been executed on the integrated systems to analyze the power distribution under different system configurations.The power consumption informations are further brought into the thermal model and thermal runaway analysis is performed.The results show that the 2.5D integrated system get better thermal stability than the 3D integration,when the system frequency is greater than 1G and the number of system stacks is greater than 5 layers.2.For communication efficiency and bandwidth improvement in the 2.5D integrated system,a reconfigurability space-time multiplexed I/O management is explored by studying memory-access data-patterns.The cores are classified in space based on their memory access demands magnitude and are connected to corresponding port of the memory controller.At each port,cores are assigned with priority,based on which I/O channels are allocated in a time multiplexed manner.This space-time multiplexing(STM)scheme is deployed inside the memory controller.The proposed reconfigurable data-pattern aware memory controller with space-time multiplexed 2.5D TSI I/O is verified by a system-level simulator with benchmarked workloads,shows that the 16-core system get 41.67%bandwidth balancing and 24.52%qualityof-service(QoS)improvement,meanwhile,the 64-core system get 51.85%bandwidth balancing and 25.16%QoS improvement.3.To balance the transmission power consumption and bit error rate,output voltage swing adjustable and receiver configurable compensation circuit are designed for 2.5D integration interconnect I/O,respectively.A Qlearning based I/O management is deployed to adaptively tune the I/O output voltage swing and enable receiver compensation under constraints of both power and BER.A 2.5D many-core and memory integration circuit level platform has been built with GlobalFoundries 65nm CMOS technology.The effects of I/O transmission power and transmission voltage on the BER are analyzed.The Q-learning algorithm is carried out in Matlab,used to train the power consumption samples off-line,and verify adjustment results.The 2.5D integrated system consist of 8 MIPS processors,8 SRAM blocks and the TSI T-line transmission line which is set to 3mm in length and 10 in width.Experimental results show that the adaptive 2.5D I/O can get 12.95%communication power reduction with only Q-learning tuning.Further,the communication power reduce 15.61%under Q-learning and receiver compensation.4.A 2.5D integrated near-data computing FPGA prototype system is designed for big data processing.The data flow of MapReduce computation framework is analysed in Hadoop platform,the kernel computing with high repeatability and simple operation is selected as the basic acceleration unit.According to the performance requirements of near-data computin of big data,the system acceleration structure is defined and then a hardware and software collaboration prototype system is built.The host sets a 6-core,12-thread Intel Xeon processor and 16GB memory.The hardware accelerator board employ the ALPHA DATA development board that integrates a Xilinx Virtex-7 690T FPGA chip and a PCIe3x8 interface.Terasort benchmark is used to test the performance of the prototype system.The execution speed and power consumption in various configurations are compared and the results show that Map task execution time is reduced by an average of 14%and the execution energy consumption is reduced by 42%.In summary,the power analysis model of 3D and 2.5D many-core and memory integrated systems are firstly constructed to analyze the system power distribution and thermal failure,it can be used to provide the theoretical basis for system integration.Secondly,the I/O management is developed with the reconfigurable switch network under the control of STM algorithms,and has shown a effective improvement of memory access QoS in 2.5D many-core and memory system.Thirdly,Q-learning based voltage-swing tuning and compensation I/O management is developed to balance the trade-off between power budget and BER constraints,and it is able to achieve signifcant power reduction for energy-effcient 2.5D memory-logic integration.Lastly,the designed near-data computing prototype system for 2.5D integrated system can improve big data processing efficiency and reduce power consumption.
Keywords/Search Tags:through-silicon via, through-silicon interposer, many-core processor and memory integration, space-time multiplexing, Q-learning, big data processing, near-data computing
PDF Full Text Request
Related items