Font Size: a A A

The acceleration of machine learning and deep learning algorithms with parallel architectures

Posted on:2017-07-19Degree:Ph.DType:Dissertation
University:University of Massachusetts LowellCandidate:He, LuFull Text:PDF
GTID:1468390014471910Subject:Computer Engineering
Abstract/Summary:
A plethora of machine learning and deep learning applications are designed and being designed recently. These machine learning and deep learning applications help human significantly on different areas, such as computer vision, natural language processing, robotics and artificial intelligence. A good machine learning model is the key of machine learning applications. Typically, in order to obtain a good model, a large amount of training data is used to train the model.;However, most machine learning and deep learning algorithms are time-intensive and energy-inefficient as they require several days even several months to train a good model with a large amount of training data on traditional central processing unit (CPU). In order to reduce the training time of machine learning and deep learning methods, heterogeneous systems are proposed to apply in the machine learning and deep learning field recently. Heterogeneous architectures are the systems with more than one kind of programmable processors or cores that each processor or core has different features. Typically, heterogeneous systems consist of CPU and co-processors. The co-processors are usually digital signal processor (DSP), field-programmable gate array (FPGA), general-purpose graphics processing unit (GP-GPU). For example, user can define its specific hardware architecture of a machine learning model in FPGA to accelerate the machine learning algorithm, and the light-weight cores in GPU could help accelerate the matrix operations greatly. With the emergence of heterogeneous systems, the machine learning and deep learning architectures require new efficient designs.;In this dissertation, several machine learning and deep learning algorithms are accelerated on heterogeneous systems with FPGA and GP-GPU. The dissertation first introduces the background of heterogeneous system and machine learning, and the combination of machine learning and heterogeneous system. Then, the dissertation describes the design and implementation of an unsupervised deep learning algorithm, stacked convolutional independent subspace analysis, for action recognition on FPGA and GPU. After the stacked convolutional independent subspace analysis is designed, the singular value decomposition becomes the bottleneck. In order to accelerate singular value decomposition, we design and implement singular value decomposition using bisection and twisted algorithm on GPU and multiple GPUs. The singular value decomposition is widely used in different areas. K-SVD algorithm, which is widely used in dictionary learning and sparse representation, is another unsupervised learning algorithm that requires the results of singular value decomposition. We accelerate the K-SVD algorithm with batching and streaming strategy. In addition to the acceleration of machine learning and deep learning algorithms above, the dissertation also includes a machine learning application for network flow identification with the consideration of classification accuracy and execution time.
Keywords/Search Tags:Machine learning, Singular value decomposition, Stacked convolutional independent subspace analysis, Heterogeneous systems, Architectures
Related items