EVolSplat4D: Efficient Volume-based Gaussian Splatting for 4D Urban Scene Synthesis

Your Image
1Zhejiang University, 2Huawei Noah's Ark Lab, 3University of Tübingen, 4Tübingen AI Center

Overview

Teaser image demonstrating Marigold depth estimation.

We propose EVolSplat4D, a unified feed-forward 3D Gaussian Splatting framework tailored for static \& dynamic urban scenes that achieves real-time rendering speeds. Leveraging both camera and tracked 3D bounding box as inputs, EVolSplat4D completes scene reconstruction in approximately 1.3 seconds, achieving photo-realistic quality comparable to time-consuming per-scene optimization methods. EvolSplat4D also supports various downstream applications, including high-fidelity scene editing and scene decomposition.

Video

EVolSplat4D Architecture

Marigold training scheme

We reconstruct urban scenes by disentangling them as close-range volume, dynamic actors, and far-field scenery, predicting 3D Gaussians of each in a feed-forward manner. a) Given a set of images, we initialize our model with the pretrained depth model and DINO feature extractor. b) In close-range volume, we leverage the 3D context of $\mathcal{F}^\text{3D}$ to predict the geometry attributes of 3D Gaussians and project the 3D Gaussians to the reference views to retrieve 2D context, including color window and visibility maps to decode their color. c) For dynamic actors, we model each instance using an instance-wise canonical space and perform feed-forward reconstitution through our proposed motion-adjusted IBR module. d) To model far-range regions, we employ a 2D U-Net backbone $\mathcal{F}^\text{2D}$ with cross-view self-attention to aggregate information from nearby reference images and predict per-pixel Gaussians. e) The composition of the three parts leads to our full model for unbounded scenes.

Comparison with Feed-Forward Methods

Dynamic Scenes

Static Scenes

Video Comparison


Interpolated Results Gallery

Feed-Forward Results on KITTI & KITTI-360

Feed-Forward Results on Waymo & PandaSet (Out-of-domain dataset)