Research On The Key Techniques Of Application-Specific Instruction-Set Processors

Posted on:2012-10-07

Degree:Doctor

Type:Dissertation

Country:China

Candidate:H Chen

Full Text:PDF

GTID:1118330362960506

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

The evolution of applications in fields like media processing and software definition radio (SDR) continuously brings more complex algorithms. The demand for high-quality services requires huge volumes of data processed under given time and resource constraints. Traditional VLIW and superscalar processors do not scale well because the area, delay, and power consumption of centralized register files increase proportionally to O(N2), O(N), and O(log4N) respectively as the number of access ports N increases. Meanwhile, the control paths of these processors, such as instruction issuing and commitment, are organized in a centralized manner, which results in high complexity and poor scalability.Multiprocessor System-on-Chips (MPSoCs) have emerged in the past decade as promising solutions to meet the computation requirement. However, integrating multiple cores on a single chip does not directly increase processors'performance or power efficiency for most sequentially written applications. The dilemma stemming from parallelizing sequential programs or writing parallel programs to take full advantage of available resources in MPSoCs drives us to rethink about the uni-core solutions from a hardware/software codesign perspective: What are the fundamental limitations of uni-core processors? How to eliminate these limitations and reduce hardware complexity? How to speedup applications with minimal modification to the architecture of state-of-the art processors? The main contributions of this paper are summarized as follows:1) We analyzed the inefficiency of a uni-core VLIW processor in processing two typical computation-intensive benchmarks from the area of media processing and software defined radio, and tried to find out fundamental limitations that hinder the scalability, performance and power efficiency of uni-core solutions. We drew the conclusion from the analysis that there are three aspects behind the inefficiency of uni-core processors, namely, the ultra-simplified instruction-set architecture (ISA) of RISC-like processors, the traditional plained binary instruction encoding scheme and centralized instruction execution control strategy. We proposed systmatical strategies to overcome the limitation of uni-core solutions based on the conclusion.2) We proposed an automatic method that can fast enumerate candidate extended instructions. This method first profiles source codes and recognizes the ALU operations that account for most of the execution time, then, generate candidate extended instructions around these typical operations using windowed and progressive search processes. The result pattern from each search step is locally optimal, which guarantees the efficiency of the ultimate pattern to some extend. This instruction enumeration method can not only effectively explore the design space but also have linear complexity. The algorithm complexity grows linearly with the number of typical operatons and average search steps around each typical operation.3) We proposed a novel resource compression method for extended instructions implemented on extended functional units. The method first finds out the critical path of an instruction and patitions the rest of the DFG (Date Flow Graph) of the instruction into multiple paths, then, finds out the MCES (Maximal Common Equivalent String) of all paths of all instructions and compresses these instructions contain the MCES. The method can guarantee resource be effectively shared among instructions. Meanwhile, the method allows modifying the DFG of an instruction through inserting simple operations into the paths of the instruction in order to reduce the number of inserted multiplexers and reduce the impact of multiplexers on area and delay.4) We proposed a hardware/software instruction encoding scheme to improve the scalability of uni-core architectures. By statically scheduling a sequence of dependent instructions into a pack, implementing common information in the pack in a dedicated instruction word, and converting instruction issuing to pack issuing, we could substantially reduce the number of bits required to encode instructions and the hardware complexity of instruction issuing, thus improving the scalability usually limited by fixed-length instruction formats and centralized instruction issuing.5) To improve the scalability and performance of uni-core processors, we proposed a novel distributed instruction execution control scheme and implement the pipeline using this scheme. The highly scalable pipeline that features in-order issuing, out-of-order execution and parallel but in-order commitment because the functional units partitioned among clusters are allowed to read operands, execute instructions, write back results and maintain data dependency themselves. The scalability is improved by the instruction execution control scheme, while the performance is enhanced by the increased hardware speed and the improved temporal data locality.6) We proposed a novel ASIP architecture based on the scalable pipeline using distributed instruction execution control. The ASIP could support complex instructions which could have a maximum of 6 input operands and 2 output operands, which substantially extend the design space of extended instructions and improve the potential speedup from instruction customization. The execution resources like functional units and register file are patitioned among clusters, inter-cluter communication is implemented through a scalable operand passing network. Thus change in the functionality of extended fuctional units will not affect the baseline architecture.

Keywords/Search Tags:

Application-Specific Instruction-Set Processor (ASIP), Extended Instruction, Automatic Instruction-Set Extension (ISE), Data-Path Compression, Scalable Instruction Encoding, Distributed Instruction Execution Control

PDF Full Text Request

Related items

1	Research On Instruction Set And Design Of Data Path For Application Specific Video Processor
2	Instruction Set Design For Encryption Application Specific Instruction Set Processor
3	Research On The Custom Instruction Mapping Of Application Specific Instruction Set Processors
4	Research On The Design And Implementation Techniques Of Customizing Application Specific Instruction Set Processors
5	The Design And Implement Of Instruction Decode&Control Unit In FT-C55LP
6	The Processor Design Based On The Scalable System-On-Chip
7	Study On Application Specific Instruction Set Processor For Video Coding And Its VLSI Implementations
8	The Design Of Ft-matrix Processor Instruction Set And Dispatch Unit
9	The Design Of FT-Matrix Processor Instruction Set And Dispatch Unit
10	Instruction-flow Scheduling Mechanism For High-performance SIMD DSP