How to train procedurally generated game-like environments at scale with Amazon SageMaker RL

 How to train procedurally generated game-like environments at scale with Amazon SageMaker RL

A gym is a toolkit for developing and comparing reinforcement learning algorithms. Procgen Benchmark is a suite of 16 procedurally-generated gym environments designed to benchmark both sample efficiency and generalization in reinforcement learning.  These environments are associated with the paper Leveraging Procedural Generation to Benchmark Reinforcement Learning (citation). Compared to Gym Retro, these environments have the following benefits:

  • Faster – Gym Retro environments are already fast, but Procgen environments can run over four times faster.
  • Non-deterministic – Gym Retro environments are always the same, so you can memorize a sequence of actions that gets the highest reward. Procgen environments are randomized so this isn’t possible.
  • Customizable – If you install from source, you can perform experiments where you change the environments, or build your own environments. The environment-specific code for each environment is often less than 300 lines. This is almost impossible with Gym Retro.

This post demonstrates how to use the Amazon SageMaker reinforcement learning starter kit for the NeurIPS 2020 – Procgen competition hosted on AIcrowd. The competition was held from June to November 2020, and results can be found here but you can still try out the solution on your own. Our solution allows participants using AIcrowd’s existing neurips2020-procgen-starter-kit to get started with SageMaker seamlessly without making any algorithmic changes. It also helps you reduce the time and effort required to build your sample-efficient reinforcement learning solutions using homogenous and heteregeneous scaling.

Finally, our solution utilizes Spot Instances to reduce cost. The cost savings with Spot GPU Instances is approximately 70% for GPU instances such as ml.p3.2x and ml.p3.8x when training with a popular state-of-the-art reinforcement learning algorithm, Proximate Policy Optimization, and a multi-layer convolutional neural network as the agent’s policy.


As part of the solution, we use the following services:

SageMaker reinforcement learning uses Ray and RLLib the same as in the starter kit. SageMaker supports distributed reinforcement learning in a single SageMaker ML instance with just a few lines of configuration by using the Ray RLlib library.

A typical SageMaker reinforcement learning job for an actor-critic algorithm uses GPU instances to learn a policy network and CPU instances to collect experiences for faster training at optimized costs. SageMaker allows you to achieve this by spinnin


Source - Continue Reading:


Related post