Font Size: a A A

Multicore NPU Based TCP Data Transmission Offload

Posted on:2016-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2348330536467413Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
TCP(Transmission Control Protocol)is one of the most important network protocols and is extremely widely used.Improving TCP performance can reduce server cluster scale and power consumption,brings both commercial and environmental benefits.Nowadays,the Ethernet technology is developing much faster than storage and CPU technologies,memory access and CPU processing network stack have become the bottleneck of TCP performance on end systems.The constantly increasing network bandwidth has caused a severe burden for CPU,optimizing TCP processing mechanism can relieve CPU from this and improve end system TCP performance.Traditional TCP acceleration techniques focus on host side optimization,protocol processing is done by host CPU.TOE(TCP Offload Engine)offloads the entire TCP protocol processing job from host CPU to NIC,and can greatly improve the end system TCP performance,but its implementation is extremely complex and it can cause security and compatibility issues.LRO(Large Receive Offload)aggregates consecutive TCP packets into a single one and decreases CPU workload by reducing the number of packets processed by the network stack,but LRO works on the NIC driver level and packets aggregation is still done by host CPU,so it cannot decrease much CPU workload.The main work of this paper is:(1)For the first time,we proposed the idea of using multicore NPU as NIC to accelerate TCP processing.The multicore NPU offloads TCP packets reordering,checksum calculation functions and aggregates small data packets into large but much fewer ones,thus reduces the number of packets processed by network stack and the number of interrupts generated by NIC,eventually improves TCP performance on an end system.(2)We designed system architecture and functions,and proposed system optimization techniques such as: multiple received packet descriptor ring,checksum optimization for aggregated packets,DMA load balance for received packets processing threads,and spontaneous ACK mechanism.(3)We implemented the system on XLS416 platform and tested its performance,experiment results show that 4.9Gbps TCP receive data throughput is achieved in a 10 Gbps network environment.
Keywords/Search Tags:TCP packets reordering, TOE, LRO, Multicore NPU
PDF Full Text Request
Related items