The acceleration of machine learning and deep learning algorithms with parallel architectures

Posted on:2017-07-19

Degree:Ph.D

Type:Dissertation

University:University of Massachusetts Lowell

Candidate:He, Lu

Full Text:PDF

GTID:1468390014471910

Subject:Computer Engineering

Abstract/Summary:

A plethora of machine learning and deep learning applications are designed and being designed recently. These machine learning and deep learning applications help human significantly on different areas, such as computer vision, natural language processing, robotics and artificial intelligence. A good machine learning model is the key of machine learning applications. Typically, in order to obtain a good model, a large amount of training data is used to train the model.;However, most machine learning and deep learning algorithms are time-intensive and energy-inefficient as they require several days even several months to train a good model with a large amount of training data on traditional central processing unit (CPU). In order to reduce the training time of machine learning and deep learning methods, heterogeneous systems are proposed to apply in the machine learning and deep learning field recently. Heterogeneous architectures are the systems with more than one kind of programmable processors or cores that each processor or core has different features. Typically, heterogeneous systems consist of CPU and co-processors. The co-processors are usually digital signal processor (DSP), field-programmable gate array (FPGA), general-purpose graphics processing unit (GP-GPU). For example, user can define its specific hardware architecture of a machine learning model in FPGA to accelerate the machine learning algorithm, and the light-weight cores in GPU could help accelerate the matrix operations greatly. With the emergence of heterogeneous systems, the machine learning and deep learning architectures require new efficient designs.;In this dissertation, several machine learning and deep learning algorithms are accelerated on heterogeneous systems with FPGA and GP-GPU. The dissertation first introduces the background of heterogeneous system and machine learning, and the combination of machine learning and heterogeneous system. Then, the dissertation describes the design and implementation of an unsupervised deep learning algorithm, stacked convolutional independent subspace analysis, for action recognition on FPGA and GPU. After the stacked convolutional independent subspace analysis is designed, the singular value decomposition becomes the bottleneck. In order to accelerate singular value decomposition, we design and implement singular value decomposition using bisection and twisted algorithm on GPU and multiple GPUs. The singular value decomposition is widely used in different areas. K-SVD algorithm, which is widely used in dictionary learning and sparse representation, is another unsupervised learning algorithm that requires the results of singular value decomposition. We accelerate the K-SVD algorithm with batching and streaming strategy. In addition to the acceleration of machine learning and deep learning algorithms above, the dissertation also includes a machine learning application for network flow identification with the consideration of classification accuracy and execution time.

Keywords/Search Tags:

Machine learning, Singular value decomposition, Stacked convolutional independent subspace analysis, Heterogeneous systems, Architectures

Related items

1	The Structural Decomposition For Singular Systems And Its Applications
2	Algorithm Research Of Iterative Learning Control For A Class Of Singular Systems
3	Shared Subspace Learning For Multi-view Data Analysis
4	Research On Iterative Learning Control Algorithms For Several Classes Of Singular Systems
5	Design And Implementation Of Singular Value Decomposition Acceleration Scheme
6	Stability Analysis And Research On Related Control Problems For Singular Systems
7	Analysis On Iterative Learning Control Algorithm For Several Kinds Of Singular Systems
8	Research On Subspace Analysis Based Face Recognition Algorithm
9	Research On Deep Neural Networks Based Classification And Representation Learning Of Heterogeneous Networks
10	A Research Of Pattern Recognition Based On Electroencephalogram Rhythm