Font Size: a A A

Research On Architecture Of Network Processor For Deep Packet Processing

Posted on:2014-08-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:B YuanFull Text:PDF
GTID:1268330422960305Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, network applications become more and morecomplicated. Applications such as Deep Packet Inspection (DPI) require deep packetprocessing in network devices. Network processors (NP) which are widely used innetwork devices as core processors need to meet the requirement of deep packetprocessing. The complicated applications need to manipulate packet header as well aspacket payload. The common practice of manipulating packets in network processors isto split the incoming packets into fixed-length segments (i.e.64-byte), then store theminto external memory. Thus in the case of deep packet processing, process engines needto access packet data from external memory in a segment-by-segment manner, whichcauses high delay and low performance. To hide the delay of memory access, hardwaremultithreading is typically used, but the delay itself can not be avoided or reduced inthis way. On the other hand, multithreading brings context switch overhead. To solvethe problem, we proposed a Push model for NP’s architectural design to increasethroughput and decrease processing delay. A hardware unit helps to pre-fetch and pushthe segments of a packet to a core’s local memory at the right time to guarantee thatpacket is processed without thread switching. Theoretical analyses and experimentalresults both indicate that the Push model not only improves the system throughput, butalso reduces the delay.Besides deep packet processing, the dependency of packets within a same networkflow should be considered in complicated applications as well. Push model can be usedfor processing on network flow without interrupt, which requires the information ofpackets assignment based on schedule mechanism. If all of the packets from a sameflow are assigned into a same core, i.e. maintaining data locality within a flow, the costof loading/saving intermediate results will be reduced. Traditional schedulingalgorithms dispatch packets at the granularity of packet, which achieves good loadbalance, but the information of flow is not considered. Flow-based scheduling algorithmkeeps both traffic locality and packet order. However, existing flow based algorithms donot satisfy the requirement of load balance. We proposed a Parallel Lookup BasedFlow-aware (PLBF) scheduling algorithm which works well with the trade-off among load balance, reordering cost and flow locality. Furthermore, the scheduling algorithmis designed for practical applications, so the hardware parallel searching technology canbe adopted to reduce the time complexity.Push model introduced changes to internal communication mechanism whileimproving the performance of network processor. The process engines receivecommunication message passively instead of requesting for response which occurs intraditional NP. To avoid the conflict of reading/writing register file caused by processengine and communication message, we proposed an improved architecture whichadopts separated message register file. The experimental results show that the proposeddesign improves the throughput and reduces the times of thread switching significantly.
Keywords/Search Tags:network processor, memory, deep packet processing, schedulingalgorithm, internal communication
PDF Full Text Request
Related items