GPU accelerated matrix factorization of large scale data using block based approach. (arXiv:2304.13724v1 [cs.LG])

Matrix Factorization (MF) on large scale data takes substantial time on a
Central Processing Unit (CPU). While Graphical Processing Unit (GPU)s could
expedite the computation of MF, the available memory on a GPU is finite.
Leveraging GPUs require alternative techniques that allow not only parallelism
but also address memory limitations. Synchronization between computation units,
isolation of data related to a computational unit, sharing of data between
computational units and identification of independent tasks among computational
units are some of the challenges while leveraging GPUs for MF. We propose a
block based approach to matrix factorization using Stochastic Gradient Descent
(SGD) that is aimed at accelerating MF on GPUs. The primary motivation for the
approach is to make it viable to factorize extremely large data sets on limited
hardware without having to compromise on results. The approach addresses
factorization of large scale data by identifying independent blocks, each of
which are factorized in parallel using multiple computational units. The
approach can be extended to one or more GPUs and even to distributed systems.
The RMSE results of the block based approach are with in acceptable delta in
comparison to the results of CPU based variant and multi-threaded CPU variant
of similar SGD kernel implementation. The advantage, of the block based
variant, in-terms of speed are significant in comparison to other variants.



Related post