Research On BGP Parallelism Technologies For Multicore And Multi-threading

Posted on:2010-03-26

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L Gao

Full Text:PDF

GTID:1118360305973666

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

To satisfy the requirement for the ultra large-scale and high-speed information network, the next generation Internet should possess the characteristics of security, real time, manageability, availability and scalability. For the limitations of hardware and intricate software architecture, the core routers which are the most important infrastructure on Internet cannot satisfy the requirements above. Especially, with the scale growth of Internet and the emergence of a large number of applications, the critical BGP (Border Gateway Protocol) in the interdomain routing system, gradually faces the challenges of performance and multi-dimensional scalability. Multicore processors have the advantages of powerful computing ability, multi-level parallelism, higher bandwidth for memory and I/O, and they are easy to scale and consume less energy, thus providing a broad research platform as the basis of the next generation Internet. Therefore, it is of great significance for practice to research how BGP utilizes the advantages of architecture and performance on multicores to meet the demands of Internet in the future, so as to provide the theoretical basis and technical support for the construction of the next generation Internet infrastructure.Towards multicore platform, the performance optimization of BGP is focused and the research of thread-level parallelism for BGP is carried out in this dissertation. Firstly, by analyzing the parallelism underlying in the behavior of BGP, two parallel architectures of Threaded BGP (T-BGP) including multi-instance T-BGP and multi-task T-BGP are presented along with their architecture designs. This dissertation emphasizes to address the issues of the parallel access to routing table and route propagation among threads in the first architecture and then the speculative thread partition in the second one. At last, the T-BGP prototype system is designed and implemented. The contributions of this dissertation are as follows:1. Two thread-level parallel architectures including multi-instance T-BGP and multi-task T-BGP are proposed on multicores. Based on the neighbor session division, the multi-instance T-BGP is composed of a manager thread and a group of BGP instance threads. With the idea of data parallelism, this architecture distributes the neighbor sessions on different BGP instance threads to achieve the parallel processing. Multi-instance T-BGP has the advantages of good optimization effects for performance, strong multi-dimensional scalability, high real-time for route selection, etc., and it is easy to implement the load balance and has low design complexity. But the key issues of parallel access to routing table and route announcement among threads both of which decide the performance of the protocol are still needed to be resolved. On the other hand, aiming at releasing the performance bottleneck of BGP, the multi-task T-BGP employs the way of task parallelism, and divides the update packet processing which is the core procedure of BGP into three parallel function threads involving parsing thread, route updating thread and packing thread, leaving the remaining functions in the master thread. This architecture could explore the parallelism inherent in the protocol, but the critical issues of thread partition and thread scheduling are still to be addressed to further improve its performance.2. Coarse-grain task parallelism limits the performance improvement of the multi-task T-BGP, and exploiting its finer-grain parallelism becomes the major method to yield better parallel efficiency. Thus, with the analysis for potential speculation of the function threads in multi-task T-BGP, a thread partition technology based on local speculation is proposed. It presents the min-cut heuristic algorithm to divide the speculative threads locally in each function thread, and also put forwards the static speculative strategy to achieve the thread speculation, making threads work in parallel. Additionally, the ordinal commit mechanism and the dependence detection strategy ensure the validity of thread speculation and program results. At last, the multi-task T-BGP based on local speculation is preliminarily implemented, and the functional verification and performance evaluation are carried out. The experimental results show that our method has the characteristics of good partition effects and low computational complexity, and greatly reduces the runtime of all fuction threads by using local speculation, thereby yielding good performance improvement.3. A mass of contentions by racing to access the routing table are brought in the multi-instance T-BGP, if still using the traditional routing table. Thus, this dissertation proposes a highly-efficient parallel access approach, in which a novel dynamic reconfigurable routing table is presented to achieve the fast parallel table access. In this approach, a heuristic-based divide-and-recombine algorithm is devised to partition routing table into subtries according to the real-time access frequency of table nodes, and then reorganize the subtries into several sets. And the algorithm is periodically performed to dynamically regulate the table structure. The main advantages of the dynamic reconfigurable routing table are concentrated on three aspects as follows. Firstly, the parallelization of route update by multi-threading is achieved through two-phase trie access by locking subtrie, so as to efficiently break the restriction of sequential table access. Secondly, the proposed divide-and-recombine algorithm significantly reduces the parallel access contentions by balancing accesses to subtrie sets. Thirdly, the table operations are effectively optimized to maintain the consistency with the behaviors of traditional routing table, attaining the higher rate of route update. The experimental results show that, when compared to traditional routing table, the maximal update time per thread is decreased by nearly 48.7%, 50.5% and 71.1% with the dynamic reconfigurable routing table under three thread configurations respectively, efficiently releasing the performance bottleneck of route update.4. A non-blocking route announcement method is put forward for the multi-instance T-BGP, thereby eliminating a great number of contentions which are incurred by the common method with the lock-based shared queues. In our scheme, each thread maintains an advertising queue for each neighbor with announcing the locally updated routes, and then gets the routes from the advertising queues of all threads with matching its local peer, thus avoiding the conflicts of accessing the same advertising queue by different threads. Additionally, each advertising queue is designed as the SCLF to significantly enhance the rate of operating the shared queues, and achieves the non-blocking route propagation among threads. At last, the route announcing process with cache ping pong mitigation is also presented to reduce the miss overhead by cache thrashing. The experimental results show that the runtime of route announcement among threads are reduced by 79.4% on average and the SCLF decreases the operation time by nearly 56.5% compared to Lamport lock-free queue.5. Based on the multi-instance T-BGP architecture, the T-BGP prototype system is implemented focusing on the details of the master thread and the slave one. With the key technologies of parallel access to routing table and non-blocking route announce- ment among threads, T-BGP could provide high-efficiency parallelism in comparison with multi-instance T-BGP. Finally, the experimental results for the key metrics of route learning time, the switch rate of eBGP neighbor, the CPU utilization and the system speedup show that T-BGP yields good performance improvement compared to BGP.To sum up, we present well-evaluated solutions in this dissertation for some key issues of BGP performance optimization. We believe that our contributions make a nice groundwork for future research and engineering on the interdomain routing protocol optimization both in theory and practice.

Keywords/Search Tags:

Multicore Processor, BGP, Performance Optimization, Thread Level Parallelism, Parallel Access to Routing Table, Route Announcement among Threads

PDF Full Text Request

Related items

1	Research On Parallel Model And Compiler Optimization Technique Based On Multi-core
2	Performance Optimization Study For Stencil On Domestic Multi-Core Processor Platform
3	Parallel Algorithm Design And Optimization For H.264 Video Encoding
4	Research Of Parallel Computing In Polymorphic Array Prcessor
5	The Research And Implementation Of Parallel Computing Method On MPCore Multicore Processor
6	The Research And Implementation Of Key Techniques On Block Cipher ASIP
7	Study On Parallel Optimizations For Transactional Memory
8	Performance Optimization On Multicore Transactional Memory Architecture Supporting Speculative Parallelization
9	The Research And Implementation Of The Key Techniques On Single Chip Multiprocessors
10	Research On Integral Optimization Techniques For Scientific Programs