TeFF: Learning 3D-Aware GANs from Unposed Images
with Template Feature Field

ECCV 2024 (Oral)

Abstract

overview

Collecting accurate camera poses of training images has been shown to well serve the learning of 3D-aware generative adversarial networks (GANs) yet can be quite expensive in practice. This work targets learning 3D-aware GANs from unposed images, for which we propose to perform on-the-fly pose estimation of training images with a learned template feature field (TeFF). Concretely, in addition to a generative radiance field as in previous approaches, we ask the generator to also learn a field from 2D semantic features while sharing the density from the radiance field. Such a framework allows us to acquire a canonical 3D feature template leveraging the dataset mean discovered by the generative model, and further efficiently estimate the pose parameters on real data. Experimental results on various challenging datasets demonstrate the superiority of our approach over state-of-the-art alternatives from both the qualitative and the quantitative perspectives.

How it works

overview
overview

We augment the generative radiance field with a semantic feature field, enabling estimating camera poses of real images on the fly to facilitate the 3D-aware GAN training. Specifically, we map a randomly sampled noise vector to a radiance field and a semantic feature field. By taking the mean shape of the feature field, we obtain a 3D template feature field. This allows us to perform efficient 2D-3D pose estimation to estimate camera poses of real images, which are in turn fed into the generator to perform volume rendering.

We leverage the template feature field to estimate camera poses of 2D real images. We discretize the azimuth θ and elevation φ angles and render the feature field from these discretized camera poses. Then we use phase correlation to estimate the scale and the in-plane rotation in the 2D image space and warp each of the rendered templates based on the solution. We calculate the mean square error between the warped rendering and the real feature and further obtain the probability distribution function of the camera pose. Finally, we sample the camera pose using inverse sampling.

360-Degree Image Synthesis Comparison

SDIP Elephant

CompCars

LSUN Plane

Shapenet Cars

Geometry Comparison

SDIP Elephant

CompCars

LSUN Plane

Shapenet Cars

Interpolation

Citation

@article{Chen2024TeFF,
    author    = {Chen, Xinya and Guo, Hanlei and Bin, Yanrui and Zhang, Shangzhan and Yang, Yuanbo and Wang, Yue and Shen, Yujun and Liao, Yiyi},
    title     = {Learning 3D-Aware GANs from Unposed Images with Template Feature Field},
    journal = {arXiv preprint arXiv:2404.05705},
    year = {2024}
}

Acknowledgements


The website template was borrowed from Jon Barron.