Font Size: a A A

Research On Several Key Technologies For Parallel Optimization On Embedded Multi-Core Systems

Posted on:2016-04-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Y DuFull Text:PDF
GTID:1368330473467147Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Computers are currently omnipresent in people's daily life and embedded system devices are readily available,as well.With the application of high-performance embedded computing becoming increasingly important,multi-core embedded systems have emerged.However,their development has been largely limited by their high energy consumption and storage performance bottleneck.How to reduce the system's energy consumption and improve the effectiveness of the memory based on the multi-core architecture has attracted much attention in the field of parallel optimization on embedded multi-core systems.Meanwhile,the issue of parallel optimization on multi-core embedded systems has been a key research area for a long time.Parallel optimization entails many complexities,among which HW/SW,task scheduling,data allocation,inter-connection network design are some main difficult problems.This paper explores these problems in four different stages and proposes step-by-step solutions,which serves as the basis for coming up with a complete embedded system as the solution in the end.To be more specific,the study has integrated intelligent algorithms,dynamic programming algorithm and other commonly-used algorithms.For parallel optimization,it is focused on striking a balance among the areas,consumption energy and time performance.The present study has conducted the following work:Firstly,hardware/software co-design is commonly adopted in embedded systems while the hardware/software partitioning is the most important part in such a hardware/software co-design.This is because we need to guarantee the system can have its best performance in spite of limited hardware resources.In other words,we need to reduce the completion time of without causing any changes to the area of hardware.Accordingly,the energy consumption gets reduced.With regard to the problem of hardware/software partitioning,the hardware and software partitioning problem is firstly defined as 0-1 problem using DAG image,which is then optimized by a shuffled frog leaping algorithm(SFLA).Given that the original SFLA is primarily used for addressing continuous optimization problems,a mechanism of turning the continuous data into discrete data is combined with its local update strategy.Experiments on the benchmark show that the SFLA proposed in this paper has better convergence.The completion time of SFLA can be respectively reduced by 16.65%and 11.46% on average,compared with that by SA and Greedy Simulated annealing algorithm.Secondly,task scheduling of multi-core and non-volatile memory embedded systems is another important part of the study.The new memory technology —— non-volatile memory has brought innovation to the memory technology.New storage technology can effectively reduce energy consumption,which is mainly applied in the main memory and cache in embedded systems,digital signal processing and multimedia applications.As the looping process of the program is usually the most time-and energy-consuming one,for the non-volatile multicore memory embedded systems,their task scheduling is focused on the embedded applications with loops such as digital signal processing and multimedia programs.By applying loop scheduling and retiming techniques,the time for single iteration can be reduced.In addition,combined with the techniques for data migration and data recomputation,the algorithm can reduce the write operation for the non-volatile main memory,further lessening the time to complete the whole task.Concerning the allocation tasks after rotation scheduling,the maximum bipartite matching methods is adopted for solving the optimal allocation.The maximum bipartite matching rotation scheduling algorithm proposed in the thesis can significantly lower the task completion time,the number of write operations of the non-volatile memory and the system's energy consumption.Experiments on the benchmark show that the proposed algorithm can respectively improve by 20.5%,15.03% and 34.5% on average in terms of the completion time,energy consumption and the number of write activities to non-volatile memory,compared with Rotation with ASAP algorithm.Based on this,the proposed algorithm has strong practicability.Thirdly,the storage and allocation of task data is an essential problem after task scheduling.Using the Scratched-Pad Memory combining the SRAM and non-volatile memory to replace traditional cache controlled by hardware is a good solution in this aspect.However,traditional work of data storage and allocation only deals with Scratched-Pad memories of single core or single structure.Data allocation has been dealt with in a way using data migration and dynamic programming strategies.A dynamic allocation optimization algorithm based on data migration is proposed.On the scratchpad memory integrating SRAM and non-volatile memory,tasks will be divided into multiple parallel regions and then an approximately optimal solution for the whole parallel region can be obtained using the proposed algorithm for data allocation.The proposed algorithm can obtain the approximately optimal solution in polynomial time.Experiments on the benchmark show that our algorithm can respectively improve 33.82%,24.27% and 10.00% on average in terms of the access time,the energy consumption and the number of write activities to non-volatile memory,compared with iterational optimal data assignment algorithms.Finally,to handle the communication between cores is a problem that cannot be side stepped in multi-core systems.It is an important step after the process of task scheduling and task data allocation is finished.Research on customizing and designing the interconnection network among the cores in accordance with specific applications is feasible.The study aims to design an inter-connection network which meets the demand for communication between cores of the system but occupies small space and consumes little energy.The present thesis puts forward an algorithm for communication scheduling based on task nodes and task data,which is based on the use of partially connected buses as interconnection network.On the premise that the scheduling length is not increased,the study constructs an inter-connection network which can meet the communication needs between the system cores but with the fewest number of buses.Moreover,an algorithm is constructed in the thesis in accordance with the Determination of Data Transition algorithm and the inter-connection network,providing the basis for designing the inter-connection network.Experiments show that to combine the two algorithms as proposed by the thesis is better than the integration of HLEFT and ASAP.Furthermore,it can reduce 15.02% of P2 P inter-connection on average compared with the algorithm only using the task optimization scheduling.
Keywords/Search Tags:Non-volatile memory, Chip multi-processors, HW/SW, Loop scheduling algorithm, Data allocation, Communication scheduling, Inter-connection network
PDF Full Text Request
Related items