Font Size: a A A

Parallel Programming And Mapping Problems On Computing System

Posted on:2012-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:W YiFull Text:PDF
GTID:2178330335463217Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
With the increasing demand for electronic products, the single-core on the chip with low performance and high power consumption may fail, which drive us to search for a more reasonable resolution. MPSoC is one of the effective resolutions with low power consumption and high performance. Unfortunately, multi-core hardware is evolving faster than software technologies. As the software cannot make full use of the hardware, the performance is not as high as it should be. We focous on the software development in the multi-core system, and figuring out how to schedule the tasks to maximum the performance.Scheduling problems are one kind of basic problems in combination optimization. They are concerned with the optimal allocation of scarce resources to activities over time. More generally, scheduling problems involve jobs that must be scheduled on machines subject to certain constraints to optimize some objective function. For difference of machine environrnent, side constraints and characteristics, optimality objects, scheduling problems contain thousands of different scheduling models.H3MP is a chip with 16 cores with high performance, high speed, high throughput, which is designed by ourselves. We use streaming media with the features of high throughput and high speed to take full use of the resource in this chip.By integrating 16 ARM cores in the FPGA board, we can bring out the four-channel fade-in and fade-out for real-time streaming media. We present two parallel models for our multiprocessor. One is fine-grained parallelization, with which the speed-up is 7.6, the other module is coarse-grained parallelization, with which the speed-up is higher than 9.2.In order to fingure out a general modul for parallelism, we work on the AC search algorithm on OpenMP, Pthread and CUDA. By comparing the result, we find out that the Pthread modul is more suitable than OpenMP modul when it comes to a complex control flow. The CUDA modul is much better than the other two modul because there are much computation cores in GPU than CPU. While the speedup is not very high if the parallelism part is not so big.With the development of VLSI, integration of an exceedingly large number of computational, logical and storage blocks in a single chip is no longer a problem, which can make more task run together. For there are more and more resources on one chip, the communication between these resources are much heavier now. That is why NoC is presented. While different IP cores on different switch make much difference on performance and power consumption, so we also do some research on mapping problems.Optimizing the communication energy and the distribution of link load is the most important problem which need to be solved, and ant colony algorithm is one of the solutions. What we need to concern is that the convergence of the traditional ant colony algorithm is very sensitive to the initialization of the parameters. So we use genetic algorithm to set the parameters in this paper. To avoid getting the local optimal solution, we use chaos module to optimizing the genetic algorithm and increase the possibility of mutation in genetic algorithm. The algorithm we improved gives a solution, which is 11% lower than the traditional algorithm on power,1% better than the traditional algorithm on load balance and 4% better when optimizing both of them.In this paper, the main contributions and innovations are shown as follows:(1) proposed a coarse-grained parallelization module, so the communication time is covered by the computation time, which increase the speedup.(2) use OpenMP, Pthread and CUDA parallel models to realize Aho-Corasick machine, and compare these three models.(3) For the convergence of the traditional ant colony algorithm is very sensitive to the initialization of the parameters. So we use genetic algorithm to set the parameters, and we use chaos module to avoid getting the local optimal solution.
Keywords/Search Tags:MPSoC(multiprocessor System-on-Chip), NoC(network-on-chip), ARM processor, FPGA, streamming media, scheduling, CUDA, mapping problems, Ant colony algorithm, Genetic algorithm, Chaos module
PDF Full Text Request
Related items