Font Size: a A A

Research On Task Scheduling Algorithms For Multiple Processor Interconnect Architecture

Posted on:2012-12-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y P ZhangFull Text:PDF
GTID:1228330467967546Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In order to solve the problems in the unique processor systems, such as power and cooling challenges, the advanced architectures based on multiple processors systems on the chip (MPSoCs) have become the dominant technologies for improving the future systems performance. MPSoCs have drawn extensive attentions both in industries and academics. Technology scales in VLSI and manufactures have made an increasing number of processors integrated on a single chip. However, as the number of processors increases to dozens and hundreds, or even thousands, the interconnect architecture will become more and more important for the chip. It is the responsibility of the on-chip interconnect architecture to ensure that the multiple, co-existing data streams on the chip are correctly and reliably transferred from source processors to their intended destinations. Especially in the deep sub-micro technologies, the processor speed is no longer the limitation to the rapid development of the systems. Instead, the on-chip interconnect becomes the constraint for designing more massive and more complicated multiple processors systems. Therefore, it is the key performance bottleneck of the overall system. Moreover, more processors integrated on the chip will lead to a tremendous on-chip interconnect architecture. It will be a major contribution to the area of the chip, and the interconnect power consumptions will grow superlinearly as the number of processors integrated on the chip increases. Therefore, it has become more important and challenging to design fast yet compact and energy efficient interconnect architecture.Task scheduling has always been the hot topic in the parallel systems, and it is NP-complete in most cases. The heuristic algorithms are usually employed to achieve the sub-optimal solutions to these problems. The progresses of the approximate algorithms are also very difficult because there are many factors need to be considered. Task scheduling is the classical problem in the scheduling theory and the key problem in parallel computation. It has got extensive and depth researches. As the advances in semiconductor technology have placed more processors onto a single chip, how to make full use of the powerful computation resources supported by the system become the hot and the difficult problems in the parallel computation. There have been a plenty of heuristic task scheduling algorithms proposed in the literature, but these traditional task schedulings did not consider the characteristics and the performance requirements of the MPSoCs. Therefore it is required to design task scheduling algorithms for the MPSoCs. Although task schedulings for MPSoCs have been studied by the worldwide researchers for past decade, there are still a lot of problems requiring to be solved. For example, some algorithms did not consider the variations of the hardware resources due to the reliability in run-time, some algorithms did not take into account the interconnect power consumptions. Task scheduling is one of the key problems in MPSoCs. Focused on the interconnect architecture and the task scheduling, the main contributions of this dissertation can be summarized as follows:1. Migration-aware adaptive MPSoC static schedules with dynamic reconfigurability based on the ring interconnect architectureTraditional static scheduling techniques either keep additional spare processors, or backup each task for fault-free execution. In order to adapt to the characteristics of MPSoCs and reduce the cost of traditional static task schedulings for fault tolerance, we propose the ring interconnect architectures and design the migration-aware adaptive static schedules with dynamic reconfigurability. The tasks on the failed processor will be transferred to its neighbors for executing completely. But excessive task migrations not only increase the interconnect power consumptions, but also add the memory accesses, etc. These will in turn result in a possible increase of the schedule execution time, even a timing violation. To reduce the interconnect cost of task migrations among processors and performance degradations, the two-line method of task migrations is proposed. Based on it, migration aware and fault tolerance algorithm (MNTM) is designed for further reducing the number of task migrations and probabilities of processor failures due to overloaded. Through implementing the workload-balancing, the algorithm MNTM not only reduces the number of task migrations and realizes the high regular task transfers, but also makes full use of computation resources. The experiments reveal that the proposed algorithms not only improves the performance degradation because of the interconnect cost as a result of task migrations among processors upon one processor failure, but also retains the highly regular and predictable task migrations of block and band reconfigurable scheme at a rational cost of schedule length.2. Task scheduling for optimizing the interconnect power consumptions based on the segmented buses architectureAs semiconductor technology towards to the deep sub-micro, the power consumptions of interconnects are a major contribution to the overall system. Therefore the design of the interconnect architectures with low power and high performance has become a hot and difficult topic currently. The segmented bus architecture is proposed in this dissertation to ameliorate the significant performance degradations and high power consumptions of the shared buses. In order to reduce the power consumption of the bus, each segment is activated only when involved in transactions, otherwise isolated from the rest of the bus by the switches. Meanwhile, the segmented buses support multi-cast and concurrent non-overlapping data transfers at the same time, thus providing the potential for improving performance. Based on the segmented bus architecture, we present the algorithm for optimizing the interconnect consumptions. It constructs a communication graph by abstracting the frequencies of data exchanges among tasks, then uses genetic algorithm to generate a logical to physical processor mapping which optimizes power consumptions of the interconnect architecture. Experimental results show that the proposed techniques reduce interconnect power consumptions significantly while maintaining the same schedule length as generated by other cluster algorithms.3. Hybrid algorithms of greedy and genetic methods based on mesh-like connected multi-processor architecture with segmented busesWith the rising number of processors integrated onto a single chip, the interconnect architecture plays a major role in the area, performance, and power consumption of the overall systems. Network-on-Chip (NoC) is emerged as the interconnect solution for many processors on the chip by utilizing the research results of the computer networks. While NoC recently gained a significant momentum, there are no complete NoC implementations of real applications reported to-date. Therefore, it is still an open research, and in the stage of experiment and theory. To remedy this situation, we propose the mesh-like connected multi-processor architecture with segmented buses to meet the requirements of low power and high performance. This architecture combines the benefits of the NoC and the shared buses. With properly configured switches at run-time, the segmented buses in the mesh-like architecture support multi-cast and concurrent non-overlapping data transfers at the same time. Based on it, two extension algorithms of the list schedulings, i.e., hybrid greedy algorithm based on the genetic method and hybrid genetic algorithm based on the greedy strategy, are presented for minimizing the interconnect power consumptions while maintaining the same schedule length as generated by the base list schedulings. We evaluate the algorithms through a real world application (Gaussian Elimination), and a series of random DAGs that represent a plenty of parallel applications. The performance results confirm the effectiveness of the two proposed hybrid algorithms.
Keywords/Search Tags:Multiple processors on the chip, Interconnect architecture, Task migration, Fault tolerance, Task scheduling, Interconnect power consumption
PDF Full Text Request
Related items