StudioRecon: 4D Human-Scene Reconstruction from Low-Overlap Captures

Abstract

Existing volumetric capture of dynamic human performance achieves high fidelity with dense camera arrays. However, in real-world scenarios, only a handful of low-overlap cameras are available, which degrades the output quality and leaves large areas unobserved. Recent 4D reconstruction methods have focused on low-overlap settings, yet they still produce noticeable artifacts in under-observed regions. Video diffusion models have emerged as another option, but they show geometrically inconsistent results for humans. To address these limitations, we propose StudioRecon, a pipeline that reconstructs 4D human scenes from sparse, low-overlap cameras by decoupling background and humans. We densify background supervision by synthesizing hundreds of camera-controlled novel views with a video diffusion model. We also robustly initialize deformable Gaussian humans with cross-view identity association and triangulated multi-view keypoint fitting. Finally, our recursive enhancement module with motion-adaptive consistency injection harmonizes the composed output, thereby further avoiding remaining artifacts. We achieve state-of-the-art novel-view synthesis across four real-world datasets and demonstrate applications such as novel trajectory rendering and human replacement.

Method Overview

Our pipeline consists of four stages: (1) Sparse-to-Dense View Synthesis using a camera-controlled video diffusion model to synthesize hundreds of novel views from sparse inputs; (2) Multi-view Human Pose Estimation with cross-view identity association and 3D triangulation; (3) Decoupled Gaussian Reconstruction optimizing backgrounds on synthesized views and humans on original videos; (4) Recursive Enhancement Module with motion-adaptive consistency injection for temporally coherent output.

Interactive Viewer

Explore the reconstructed scene at t=0. Drag to orbit, scroll to zoom, arrow keys to move, WASD to rotate. Shown without Difix enhancement and with spherical harmonics downsampled to degree 2 for web delivery.

Qualitative Comparison

Novel view synthesis from 4 sparse cameras on held-out evaluation views. All methods are trained on the same input.

Ground Truth

Ours

Dyn3DGS

MonoFusion

STG

Ground Truth

Ours

Dyn3DGS

MonoFusion

STG

Ground Truth

Ours

Dyn3DGS

MonoFusion

STG

Applications

Novel Camera Trajectories

Our Gaussian representation supports rendering from arbitrary camera paths, including dolly zoom and oscillating motion.

Dolly Zoom

Oscillating Trajectory

Human Replacement

Since humans and backgrounds are reconstructed independently, we can replace actors with new identities from a single reference image.

Original

Replaced

Ablation Study

We ablate our two key contributions: dense view synthesis via video diffusion and recursive diffusion enhancement.

4-View Baseline

+ Dense View Synthesis

+ Enhancement (Ours)

BibTeX

@inproceedings{hwang2026studiorecon,
  title     = {4D Human-Scene Reconstruction from Low-Overlap Captures},
  author    = {Hwang, Minhyuk and Kim, Sangmin and Do, Seunguk and Kim, Daneul and Park, Jaesik},
  booktitle = {ACM SIGGRAPH 2026 Conference Proceedings},
  year      = {2026}
}

4D Human-Scene Reconstructionfrom Low-Overlap Captures

Abstract

Method Overview

Interactive Viewer

Qualitative Comparison

Applications

Novel Camera Trajectories

Human Replacement

Ablation Study

BibTeX

4D Human-Scene Reconstruction
from Low-Overlap Captures