Research On Acceleration Technology For Deep Learning Inference Based On Multi-core And Many-core Platforms

Posted on:2020-06-25

Degree:Master

Type:Thesis

Country:China

Candidate:K Q Zhu

Full Text:PDF

GTID:2428330611493633

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Deep learning technology has subverted traditional methods in many fields such as image target detection and recognition,speech recognition and so on.However,the essence of intelligence is computing,so the application of high throughput deep learning model relies on strong computing power support.Developing and optimizing the computing efficiency of the architecture platform and improving the intelligent processing capability are the most important to promote the application progress.Multi-core and many-core architectures are effective platforms for high throughput machine learning inference applications.In addition to x86 and GPU platforms,there are still many computing technologies for multi-core and many-core architectures to be researched.The domestic Phytium processor can be designed as multi-core(below 64 cores)or manycore(64 cores or more)through hierarchical expansion,assuming the role of multi-core main processors or domain many-core accelerators.The current Phytium 2000+ 64 cores high performance processor can be used for researching the multi/many-core platform intelligent optimization technology.In addition,the VLIW architecture based multi-core DSP chip with high energy efficiency is also a good platform for intelligent computing.This paper aims at high throughput domestic autonomous Phytium processors and low power multi-core DSP chips,researching on their intelligent processing adaptation and algorithm optimization technology,and exploring technical approach for new generation of domestic autonomous intelligent computing system.Firstly,this paper studies the reference optimization technology of frameless deep learning application on Phytium platform.The hardware structure and available parallel resources of Phytium processor are comprehensively analyzed.Learning the characteristics of inference algorithm,several optimization techniques are proposed.Effectiveness of relevant optimization technology was evaluated based on several typical applications.The performance of RNNs-LSTM model based application is 10.2 times after optimization,and the overall performance of the platform has reached 2.9 times of the mainstream high-performance x86 platform.This paper then studies the reference optimization technology of framework-based deep learning application on Phytium platform.Relevant experimental results show that the performance of the framework is greatly improved.Finally,based on the current mainstream multi-core DSP chips,this paper carries out depth learning inference under VLIW architecture.With comprehensive analysis of its hardware structure and parallel resources,this paper put forward many application and platform related optimization technology.Experimental results show that the energy efficiency of the DSP chip is 7.79 times that of the high-performance x86 chip,and 3.56 times that of the embedded ARM chip.

Keywords/Search Tags:

Phytium processors, Multi-core VLIW, Deep learning, Parallel acceleration, Framework optimization

PDF Full Text Request

Related items

1	Research On Parallel Optimization Of Transformer Model Based On The New Generation Of Sunway Many-core Processors
2	Parallel Optimization And Realization Of HEVC Decoder Based On Multi-Core Processors
3	Statistical machine learning based modeling framework for design space exploration and run-time cross-stack energy optimization for many-core processors
4	The Orchestration Of Instruction Issuing In Data Parallel Processors
5	Design and analysis of time-predictable single-core and multi-core processors
6	Download System Research And Development Based On The Parallel Multi-core Environment
7	Parallel Design And Optimization Of SpMV On ARM Multi-core Platform
8	Research Of Parallel Image Fusion Processing Technology Based On Multi-core Processors
9	Research And Application Of Deep Learning's Parallel Acceleration
10	Multi-threaded Algorithm Parallel Optimization Based On Multi-core Processors