SSTM: Spatiotemporal Recurrent Transformers for Multi-frame Optical Flow Estimation. (arXiv:2304.14418v1 [cs.CV])

Inaccurate optical flow estimates in and near occluded regions, and
out-of-boundary regions are two of the current significant limitations of
optical flow estimation algorithms. Recent state-of-the-art optical flow
estimation algorithms are two-frame based methods where optical flow is
estimated sequentially for each consecutive image pair in a sequence. While
this approach gives good flow estimates, it fails to generalize optical flows
in occluded regions mainly due to limited local evidence regarding moving
elements in a scene. In this work, we propose a learning-based multi-frame
optical flow estimation method that estimates two or more consecutive optical
flows in parallel from multi-frame image sequences. Our underlying hypothesis
is that by understanding temporal scene dynamics from longer sequences with
more than two frames, we can characterize pixel-wise dependencies in a larger
spatiotemporal domain, generalize complex motion patterns and thereby improve
the accuracy of optical flow estimates in occluded regions. We present
learning-based spatiotemporal recurrent transformers for multi-frame based
optical flow estimation (SSTMs). Our method utilizes 3D Convolutional Gated
Recurrent Units (3D-ConvGRUs) and spatiotemporal transformers to learn
recurrent space-time motion dynamics and global dependencies in the scene and
provide a generalized optical flow estimation. When compared with recent
state-of-the-art two-frame and multi-frame methods on real world and synthetic
datasets, performance of the SSTMs were significantly higher in occluded and
out-of-boundary regions. Among all published state-of-the-art multi-frame
methods, SSTM achieved state-of the-art results on the Sintel Final and
KITTI2015 benchmark datasets.



Related post