HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting

CVPR 2024

1Zhejiang University, 2Huawei Noah's Ark Lab, 3University of Tübingen, 4Tübingen AI Center

Overview

Teaser image demonstrating Marigold depth estimation.

We present HUGS, a novel pipeline that utilizes 3D Gaussian Splatting for holistic urban scene understanding. Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians, where moving object poses are regularized via physical constraints. Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy, and reconstruct dynamic scenes, even in scenarios where 3D bounding box detection are highly noisy.

Video

Comparison with NSG & MARS based on Noisy BBOX

How it works

Our algorithm takes as input posed images of a dynamic urban scene. We decompose the scene into static and dynamic 3D Gaussians, with the motion of dynamic vehicles being modeled via a unicycle model. The 3D Gaussians represent not only appearance but also semantic and flow information, allowing for rendering the RGB images, semantic labels, as well as optical flow through volume rendering.

Marigold training scheme

More Rendering Results

BibTeX

@InProceedings{Zhou_2024_CVPR,
      author    = {Zhou, Hongyu and Shao, Jiahao and Xu, Lu and Bai, Dongfeng and Qiu, Weichao and Liu, Bingbing and Wang, Yue and Geiger, Andreas and Liao, Yiyi},
      title     = {HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      month     = {June},
      year      = {2024},
      pages     = {21336-21345}
  }