How Latent Space used the Amazon SageMaker model parallelism library to push the frontiers of large-scale transformers

 How Latent Space used the Amazon SageMaker model parallelism library to push the frontiers of large-scale transformers

This blog is co-authored by Sarah Jane Hong CSO, Darryl Barnhart CTO, and Ian Thompson CEO of Latent Space and Prem Ranga of AWS.

Latent space is a hidden representation of abstract ideas that machine learning (ML) models learn. For example, “dog,” “flower,” or “door” are concepts or locations in latent space. At Latent Space, we’re working on an engine that allows you to manipulate and explore this space with both language and visual prompts. The Latent Space team comes from two fields that have long had little overlap: graphics and natural language processing (NLP). Traditionally, the modalities of images and text have been handled separately, each with their own history of complex, expensive, and fragile feature engineering. NLP tasks like document understanding or question answering have usually had little in common with vision tasks like scene understanding or rendering, and usually we use very different approaches and models for each task. But this is rapidly changing.

This merging of modalities in a single shared latent space unlocks a new generation of creative and commercial applications, from gaming to document understanding. But unlocking these new applications in a single model opens up new scaling challenges, as highlighted in “The Bitter Lesson” by Richard Sutton, and the exciting work in the last few years on scaling laws. To make this possible, Latent Space is working on cutting-edge research to fuse these modalities in a single model, but also to scale and do so efficiently. This is where model parallelism comes in.

Amazon SageMaker‘s unique automated model partitioning and efficient pipelining approach made our adoption of model parallelism possible with little engineering effort, and we scaled our training of models beyond 1 billion parameters (we use the p4d.24xlarge A100 instances), which is an important requirement for us. Furthermore, we observed that when training with a 16 node, eight GPU training setup with the SageMaker model parallelism library, we recorded a 38% improvement in efficiency compared to our previous training runs.

Challenges with training large-scale transformers

At Latent Space, we’re fusing language and vision in transformer models with billions of parameters to support “out of distribution” use cases from a user’s imagination or that would occur in the real world but not in our training data. We’re handling the challenges inherent in scaling to billions of parameters and beyond in two different ways:

Information retrieval techniques have long been a key component of search engines and QA tasks. Recently, exciting progress has been made combining classic IR techniques with modern transformers, specifically for question


Source - Continue Reading:


Related post