Working Memory Connections for LSTM. (arXiv:2109.00020v1 [cs.LG])

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of
gating mechanisms to mitigate exploding and vanishing gradients when learning
long-term dependencies. For this reason, LSTMs and other gated RNNs are widely
adopted, being the standard de facto for many sequence modeling tasks. Although
the memory cell inside the LSTM contains essential information, it is not
allowed to influence the gating mechanism directly. In this work, we improve
the gate potential by including information coming from the internal cell
state. The proposed modification, named Working Memory Connection, consists in
adding a learnable nonlinear projection of the cell content into the network
gates. This modification can fit into the classical LSTM gates without any
assumption on the underlying task, being particularly effective when dealing
with longer sequences. Previous research effort in this direction, which goes
back to the early 2000s, could not bring a consistent improvement over vanilla
LSTM. As part of this paper, we identify a key issue tied to previous
connections that heavily limits their effectiveness, hence preventing a
successful integration of the knowledge coming from the internal cell state. We
show through extensive experimental evaluation that Working Memory Connections
constantly improve the performance of LSTMs on a variety of tasks. Numerical
results suggest that the cell state contains useful information that is worth
including in the gate structure.



Related post