| With the rapid development of 5G,Internet of Things,cloud computing and the industry digitalization,Internet traffic has been growing exponentially,and network applications have become more and more complex and variable.Network devices are required to have both high performance and excellent flexibility to provide customization and optimization capabilities for new network services.However,the network processor,as the core processing unit of the network equipment,uses general-purpose or application-specific multiple cores,which is difficult to have both high performance and excellent flexibility.Although general-purpose CPU-based network processors are highly flexible,they have limited throughput and high latency.Application-specific CPU-based network processors have high performance by adding customized instructions,but they are hard to rapidly deploy new protocols or actions which require redesigning a new application-specific instruction processor.FPGA has good reconfigurability and high performance,suitable to accelerate the parallel logic in network functions.Currently,FPGA has been widely used in communication areas such as data centers and 5G.Compared with FPGA,ASIC is fixed,but has higher performance,and it is suitable to implement the common part of network functions.Therefore,we proposes a high performance and reconfigurable network processor model which integrates ASIC,FPGA,and CPU to match different types of packet processing for high performance and excellent flexibility.In this paper,we make an intensive study of the design of reconfigurable network processor and its key techniques.The main work and innovations are as follows.1.We propose an ASIC-FPGA-CPU co-design and reconfigurable network processor architecture,namely PicoArch.According to the characteristics of packet processing,PicoArch splits the data plane into three sub-planes for fast forwarding,NF(Network Function)acceleration,and deep packet processing.The fast forwarding sub-plane is implemented on ASIC,and abstracts packet processing into protocol-independent "match-action" processing,which can realize various stateless packet processing such as classification,forwarding and scheduling,with high throughput and low latency.The acceleration sub-plane is based on FPGA,and leverages FPGA’s reconfigurability to accelerate customized processing logic,such as new protocols or actions,with high performance and flexibility.The deep packet processing sub-plane is based on the general-purpose CPUs and integrated with large memory,suitable to implement stateful packet processing,e.g.,parsing application layer protocols,deep packet inspection,with high flexibility.2.To avoid redesigning packet parser when deploying new protocols on the ASIC,we proposes a protocol-independent programmable parser,namely P5.P5 abstracts protocol parsing into protocol-independent “match-extract” processing by leveraging the characteristic that protocols are parsed by layer.While "match" parses the protocol type,"extract" is used to extracts fields from the identified packet header.Thus,developers can customize the protocol type and fields to be extracted by configuring the“match-extract” table.As protocols are independent between any two packets,P5 can parses multiple packets in parallel to improve performance.Moreover,P5 applies several optimization techniques,such as SRAM-based protocol matching algorithms and forecast parsing,to reduce hardware resource consumption and improve performance.3.To reduce the difficulty of developing hardware network functions on the FPGA,we propose a reconfigurable packet processing model,namely DrawerPipe.DrawerPipe adopts a modular framework and abstracts hardware pipeline into a series of "drawers" with the same interface.Drawerpipe allows developers to load developed or customized hardware modules in "drawers" and combine them to reconstruct new network function.Furthermore,DrawerPipe provides five basic modules to avoid redundant development of the same function,which is helpful to reduce the difficulty of network function development.Finally,we designed a programmable module indexing mechanism,i.e.,PMI,allowing users to customize the processing sequence of DrawerPipe modules to construct any required network function service chain.4.To evaluate the performance and flexibility of PicoArch,we implemented a network processor prototype based on PicoArch,namely NP40 S,by using Xilinx FPGA and general-purpose CPUs.Moreover,we provides a network processor application development service,i.e.,NPAS,for NP40 S.NPAS not only provides network function design,development,debugging tools and abundant application programming interfaces to reduce the difficulty of network function development;but also integrates a software NP emulator which supports chip-level functional simulation and performance evaluation.We implemented a simplified L3 routing(composed of some stateless packet processing)and a unified security gateway that integrates various stateful packet processing,based on NP40 S.Then,we simulated their processing performance and resource consumption.The results show that PicoArch has excellent processing performance and flexibility,which can meet the needs of network functions for throughput,latency and flexibility. |