Learning Robust Policies for Generalized Debris Capture with an Automated Tether-Net System. (arXiv:2201.04180v1 [cs.RO])

Tether-net launched from a chaser spacecraft provides a promising method to
capture and dispose of large space debris in orbit. This tether-net system is
subject to several sources of uncertainty in sensing and actuation that affect
the performance of its net launch and closing control. Earlier
reliability-based optimization approaches to design control actions however
remain challenging and computationally prohibitive to generalize over varying
launch scenarios and target (debris) state relative to the chaser. To search
for a general and reliable control policy, this paper presents a reinforcement
learning framework that integrates a proximal policy optimization (PPO2)
approach with net dynamics simulations. The latter allows evaluating the
episodes of net-based target capture, and estimate the capture quality index
that serves as the reward feedback to PPO2. Here, the learned policy is
designed to model the timing of the net closing action based on the state of
the moving net and the target, under any given launch scenario. A stochastic
state transition model is considered in order to incorporate synthetic
uncertainties in state estimation and launch actuation. Along with notable
reward improvement during training, the trained policy demonstrates capture
performance (over a wide range of launch/target scenarios) that is close to
that obtained with reliability-based optimization run over an individual

Source: https://arxiv.org/abs/2201.04180


Related post