Panoptic Lifting for 3D Scene Understanding with Neural Fields. (arXiv:2212.09802v1 [cs.CV])
We propose Panoptic Lifting, a novel approach for learning panoptic 3D
volumetric representations from images of in-the-wild scenes. Once trained, our
model can render color images together with 3D-consistent panoptic segmentation
from novel viewpoints.
Unlike existing approaches which use 3D input directly or indirectly, our
method requires only machine-generated 2D panoptic segmentation masks inferred
from a pre-trained network. Our core contribution is a panoptic lifting scheme
based on a neural field representation that generates a unified and multi-view
consistent, 3D panoptic representation of the scene. To account for
inconsistencies of 2D instance identifiers across views, we solve a linear
assignment with a cost based on the model’s current predictions and the
machine-generated segmentation masks, thus enabling us to lift 2D instances to
3D in a consistent way. We further propose and ablate contributions that make
our method more robust to noisy, machine-generated labels, including test-time
augmentations for confidence estimates, segment consistency loss, bounded
segmentation fields, and gradient stopping.
Experimental results validate our approach on the challenging Hypersim,
Replica, and ScanNet datasets, improving by 8.4, 13.8, and 10.6% in scene-level
PQ over state of the art.
Source: https://arxiv.org/abs/2212.09802