Font Size: a A A

Research On High-performance And Low-power Edge Computing Based On Non-volatile Memory

Posted on:2021-04-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:K LiuFull Text:PDF
GTID:1368330605969593Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet of Things,billions to tens of billions of IoT edge devices will be connected to the network,resulting in huge scale of data.To deal with these massive amounts of data,cloud computing models that use centralized computing,storage,and transmission face problems such as insufficient real-time performance,insufficient bandwidth,huge energy consumption,and security and privacy issues.By offloading the computing capabilities and storage capabilities from the cloud to the edge of the network,edge computing has emerged to meet the requirement such as real-time process,data optimization,intelligent services,security and privacy.Edge computing has smaller delays,less transmission overhead,and higher security,drawing extensive attention from academics and businesses.Numbers of companies and organizations have initiated the establishment of edge computing alliances.Through in-depth cooperation with industrial application alliances such as the Industrial Internet Alliance and SDNFV Industry Alliance,edge computing has been promoted in many fields,such as smart cities,smart homes,online live streaming,autonomous driving and manufacturing.Compared to server clusters in cloud computing centers,the edge side with limited space,energy and computing capacity faces huge challenges to deal with increasing data scale and processing tasks that double the demand for computing power such as artificial intelligence.It is an urgent problem to build low-power,high-efficiency edge nodes,efficiently complete data processing tasks in real time,and realize diverse data store and access rapidly and conveniently.Non-volatile memories(NVMs),compared with traditional static random memories and dynamic random memories,have the characteristics of high storage density,non-volatility,good scalability,and low static power consumption.These excellent characteristics provide opportunities for edge nodes to improve storage capacity and computing performance.On one hand,non-volatile memories serve a wide range of storage systems,including non-volatile on-chip caches,non-volatile main memories,non-volatile external storage,and hybrid storage,to increase storage density and reduce the overhead of the increasing leakage power and increasing dynamic refresh power.On the other hand,non-volatile memories are also applied to the field of in-memory computing and non-volatile computing,which can potentially reduce data movement in the process and bring performance/power improvement of computing processes further.However,non-volatile memories also face some problems,such as read-write asymmetry,long write latency,and limited endurance.This paper will focus on applying non-volatile memory technologies to edge computing,to improve both the performance of edge storage and the efficiency of processing edge data,and promote the widespread application and durable development of edge computing.This paper first conducts research on high-performance storage systems based on non-volatile memories.In order to cope with the diverse storage requirements of edge computing applications and to study the characteristics of non-volatile memories in depth,this paper designs an array-based non-volatile memory verification architecture based on FPGAs.With the reconfigurability of FPGAs,this verification architecture not only supports multi-level evaluations of performance and power,from the underlying device level to higher system level,but also supports verification of hybrid storage solutions.According to the proposed non-volatile memory architecture,this paper implements the array hardware prototype composed of SoC-FPGAs and MRAMs,and builds a multi-level memory system.In this multi-level memory system,to interconnect multi-level FPGAs,a chip-level and board-level integrated bus is designed,which not only realizes low-latency high-bandwidth data transmission,but also supports flexible storage level expansion and storage capacity expansion.The evaluations on IOZone-like benchmarks show that the proposed multi-level memory architecture has high-speed read and write,and also good scalability,promoting the construction of high-speed and reliable edge computing storage systems.Secondly,in order to implement structured and unstructured data store and access rapidly,Key-Value Stores(KVSs)have the potential to be a cache layer to improve the efficiency of data access in edge systems and effectively reduce energy consumption.Hash is one of the core operations in the KVS streaming framework,whose throughput plays an important role in the query speed of stored content.This paper proposes a high-performance scalable architecture for Murmurhash2 to improve data locality and reduce the latency,and also deploy partial reconfiguration scheme to maximize the ratio of performance/area.First,to reduce latency and increase bandwidth,the characteristics of various logic and operation units in FPGA are well studied.This paper proposes an optimized implementation of the mathematical operations in Murmurhash2.Then,based on the Murmurhash2 operation flow,a combination of pipeline and parallel computing architecture is proposed,and the core aiming for performance and the core aiming for resource are designed.Finally,partial reconfigurable technology is introduced to dynamically switch the two computing cores based on the load situation to further improve the performance/energy ratio.In addition,in the KVS system accelerated by FPGAs,FPGA' s on-chip memories(Block RAMs,BRAMs)are not only used to cache the hot data of the hash table,but also used as a cache for the packet processing unit of communication process in KVSs.Non-volatile memory brings high density and low power consumption to FPGA' s on-chip storage BRAM,greatly improving the cache capacity and reducing the energy overhead of extra data exchange.However,the endurance issues caused by non-volatile memories cannot be ignored.This paper explores the write distribution of word levels within on-chip memory blocks,and proposes a fine-grained performance-aware wear-leveling algorithm.Compared with the traditional wear-leveling algorithm with huge performance overhead,the proposed algorithm expands the search space of the simulated annealing algorithm in the layout process and improves the endurance of non-volatile on-chip memories without greatly sacrificing performance.By mapping the logical hot words to the physical cold words and the logical cold words to the physical hot words,wear-leveling is performed at the level of the intra-BRAM physical words.In order to achieve flexible mapping at the word level,the mapping relationship between the logical address lines and the decoding pins of the BRAM in the logical netlist is modified by reconfiguring the additional crossbar in front of the address lines of BRAMs.By establishing a crossbar-based address line remapping model,a performance-aware address line remapping algorithm is designed to support the physical word level matching correspondingly.Compared with the traditional wear-leveling algorithms and wear-leveling algorithm based on BRAM blocks,the proposed algorithm improves the flexibility of the layout stage,which increases the possibility of finding shorter critical paths,and improves performance degradation due to increased lifetime.Besides researching how to employ non-volatile memories to improve the performance of edge-side memory and cache systems,this paper further explores building high-performance computing cores based on non-volatile memory technologies.Heterogeneous Computing is a typical computing architecture on the edge.FPGAs,with high parallelism,good locality of data,and reconfigurability,are key members in Heterogeneous Computing(HC).As the capacity of FPGAs increases,current SRAM-based FPGAs face problems such as high leakage power,volatility,and limited scalability.The introduction of non-volatile memory technology into FPGAs optimizes FPGA power consumption and improves FPGA logic density.In addition to low leakage power consumption,large capacity,and non-volatility,NVMs also support storing multiple bits per memory cell(Multi-level Cell,MLC).The MLC feature can significantly increase storage capacity,but also brings problems such as longer read and write latency and higher write power consumption.This paper studies how to introduce MLC in the logic operation unit of non-volatile FPGA to improve system energy efficiency.On the one hand,the introduction of MLC improves logic capacity,reduces area overhead and the length of interconnect,leading to shorter routing delay.On the other hand,the hard bit in MLC will bring higher read latency and increase the delay of the logic units.Considering the above two aspects,this article uses MLC units to replace the Single-level Cell(SLC)storage units in the look-up tables in the Configurable Logic Block(CLB)to study its structural characteristics and analyze its working characteristics.Then,another four structures of the configurable logic block are designed based on multiple perspectives such as input,output,and operating modes.Then the characteristics of the proposed structures are evaluated from three aspects such as critical path delay,area overhead,and leakage power consumption,providing a reference for building low-latency,high-performance MLC-based FPGAs.At the same time,in order to deploy applications to MLC-based FPGAs more reasonably,this paper further optimizes the synthesis flow.Based on the high logic capacity introduced by MLC and the high latency brought by hard bits,this paper proposes a synthesis algorithm driven by MLC-aware performance.Criticality-based feasibility check method is proposed to avoid the use of hard bits in critical path,leading to higher performance.Logic density is also wanted by employing dynamic weight method to reduce the area overhead.Compared with SRAM-based FPGAs and SLC-based non-volatile FPGAs,the proposed architectures with the proposed modification of CAD flow can deliver higher performance,lower leakage power,and smaller area.
Keywords/Search Tags:Non-volatile Memory, High-performance and Low-power Edge Computing, Multi-level Cell, Non-volatile FPGA, Dynamic Partial Reconfiguration
PDF Full Text Request
Related items