Implementing And Optimizing The Alternating Least Squares Algorithm On Many-cores

Posted on:2019-07-10

Degree:Master

Type:Thesis

Country:China

Candidate:J Chen

Full Text:PDF

GTID:2370330623450959

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The task of recommender system is to combine users and items and then help massive users find valuable information.An excellent recommender algorithm can greatly improve the efficiency of the recommender systems,enhance the accuracy of the recommender results and meet the real-time needs of massive Internet users.At the same time,heterogeneous architectures have been developing dramatically to enable the combination of the high-performance computing technology and recommender systems.In this work,we aim to implement and optimize a typical algorithm of recommender systems-alternating least squares(ALS),on modern multi-cores and many-cores..In brief,we make the following contributions:1.We implement ALS with OpenCL to enable the portability of the algorithm on various platforms.To do so,we apply specific optimization techniques for different platforms according to their corresponding features.Thus,we generate various code variants,and researcher can select a suitable one based on their target platform.The experimental results show that our optimized ALS implementation performs 5.5� and21.2� faster than the baseline implementation on Intel E5-2670 and K20 C,respectively.Meanwhile,our implementation outperforms cuMF on K20 C over various datasets.2.Based on an indepth analysis of the algorithm hotspot and the existing ALS implementations,we propose an efficient implementation using a fine-grained parallel optimization technique.We implement the fine-grained task partition and use a more suitable thread configuration.The experimental results show that our implementation performs upto 88� faster on K20 C and upto 98� faster on gfx803 than the baseline implementation.On K20 C,our implementation also outperforms cuMF over different latent feature size ranging from 10 to 100 with various recommendation datasets.3.According to our statistics of recommendation datasets and the analysis of the previous implementations,we propose an efficient implementation based on ALS algorithm on top of a new sparse matrix blocked storage format for parallel matrix factorzation.Specifically,we develop a new compressed storage format for sparse matrices,and a data reuse technique.We observe that the repeated data loads of row/column vectors can be avoided by this data reuse technique.Further,we propose a data reordering technique to enhance the performance benefits of data reuse.Our experimental results show that our novel implementation attains a speedup of 2.08� and3.72� over the state-of-the-art fastest Gates' implementation on K20 C and TITAN X,respectively.To the best of our knowledge,this is the fastest ALS implementation for matrix factorization on modern many-core GPUs.In a nutshell,we develop an efficient ALS implementation by comprehensively considering the features of the ALS algorithm,the specifics of modern many-corearchitectures,and the inherent characteristics of recommender datasets.By doing so,we can sachieve a more balanced usage of both on-chip and off-chip hardware resources.We believe that our implementation can be integrated into modern bigdata processing framework such as Spark in a straightforward way.

Keywords/Search Tags:

Alternating least squares, Portability, Data Tiling, Data reuse, Data reordering

PDF Full Text Request

Related items

1	Statistical methods for the analysis of genomic data from tiling arrays and next generation sequencing technologies
2	An Alternating Iterative Method For Solving Separable Nonlinear Least Squares Problems
3	Design And Implementation Of Hypersonic Boundary Layer Transition Data Management System
4	Incorporating genome annotation in the statistical analysis of genomic and epigenomic tiling array data
5	Research On The Parameter Estimation Problem Of Panel Data Models
6	Research And Application Of Gravity And Magnetic Data Fusion Method In Jining,inner Mongolia
7	The Research Of Surface Reconstruction Based On The Point Cloud Data Of Ground 3D Laser Scanning
8	Seismic data reconstruction with regularized Least-squares PreStack Kirchhoff Time Migration
9	Study On The Extended Theories Of Generalized Total Least Squares And Their Applications In Surveying Data Processing
10	The Design And Realization Of Data Editing And Updating Basing On Mapinfo