EVolSplat4D: Efficient Volume-based Gaussian Splatting for 4D Urban Scene Synthesis

Your Image
1Zhejiang University, 2Huawei Noah's Ark Lab, 3University of Tübingen, 4Tübingen AI Center

Overview

Teaser image demonstrating Marigold depth estimation.

We propose EVolSplat4D, a unified feed-forward reconstruction model tailored for static & dynamic urban scenes that achieves real-time rendering speeds. Leveraging both camera and LiDAR cues as inputs, EVolSplat4D can produce novel, photo-realistic renderings in a very short time (approximately 1.3s), achieving quality that is comparable to time-consuming per-scene optimization methods. EVolSplat4D also supports various downstream applications, including high-fidelity scene editing and scene decomposition.

Video

EVolSplat4D Architecture

Marigold training scheme

We reconstruct urban scenes by disentangling them as close-range volume, dynamic actors, and distant view, predicting 4D Gaussians of urban scenes in a feed-forward manner. a) Given a set of images and sparse lidar points, we initialize our model with the pretrained depth model and DINO feature extractor. b) In close-range volume, we leverage the 3D context of $\psi^\text{3D}$ predict the geometry attributes of 3D Gaussians and project the 3D Gaussians to the reference views to retrieve 2D context, including color window and visibility maps to decode their color. c) For dynamic actors, we model each instance using an instance-wise canonical space and perform generalizable reconstitution through our proposed motion-adjusted IBR module. d) To model far regions, we employ a 2D U-Net backbone with cross-view self-attention to aggregate information from nearby reference images and predict per-pixel Gaussians. e) The composition of the three parts leads to our full model for the unbounded scenes.

Comparison with Feed-Forward Methods

Dynamic Scenes

Static Scenes

Video Comparison


Feed-Forward Results Gallery

Feed-forward Results on KITTI & KITTI-360

Feed-forward Results on Waymo & PandaSet (Out-of-domain dataset)