Skip to content

facebookresearch/soundvista

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding [CVPR 2025]

Project Page Paper PDF

We introduce SoundVista: a neural network pipeline to generate the ambient sound of arbitrary scene at novel viewpoints, without requiring any constraint or prior knowledge of sound source details.

Real Demo

Please watch with your headphones or speaker that supports binaural audio!

👉 Click here to watch the demo video

Dataset

SoundSpace-Ambient Matterport3D

data folder structure mp3d

  • sim_scenes: pano rgb-d pkl. Render sound-spaces, ref scripts/demo/mp3d_continuous_pano_render.py
  • benchmark_pkl
  • metadata sound-spaces
  • sim_audios
  • sounds sound-spaces
    • 1s_all
    • semantic_splits
  • acoustic_params echo t60 npy
  • binaural_rirs
  • ambisonic_rirs
  • benchmark index files: mp3d_mulv3_sparse_new.pkl (train)
  • budget number: ref_sampler_budget.pkl

Run the Code

Environment Setup

Compile sound-spaces first to render the SoundSpace-Ambient Matterport data.

Visual Acoustic Binding

Training on Soundspace-Matterport3D (mp3d)

CUDA_VISIBLE_DEVICES=0 python3 tools/train_vab.py --cfg configs/vab_mp3d.yaml

Reference Sampler (example on mp3d scenes)

CUDA_VISIBLE_DEVICES=0 python3 tools/ref_sample_mp3d.py --cfg configs/vab_mp3d.yaml --visualize-path output/ref_sampling/ --eval-metrics model.resume_path data/pretrained_weights/vab_pretrain.pth

SoundSpace-Ambient Experiments (mp3d)

Training

CUDA_VISIBLE_DEVICES=0 python3 tools/train_mp3d.py --cfg configs/soundvista_mp3d.yaml --visualize-path output/ref_sampling/ train.pretrained data/pretrained_weights/vab_pretrain.pth dataset.img_num_per_gpu 16 output_dir soundvista_mp3d

Evaluation

CUDA_VISIBLE_DEVICES=0 python3 tools/eval_mp3d.py --cfg configs/soundvista_mp3d.yaml --visualize-path output/ref_sampling/ --eval-scenes unseen model.resume_path data/pretrained_weights/soundvista_mp3d.pth output_dir soundvista_mp3d_eval

Demo (an example on one scene of mp3d)

# step 1: render route and reference pano RGBD and audio
python3 scripts/demo/mp3d_demovis.py

# step 2: render continuous target pano RGBD and video
python3 scripts/demo/mp3d_continuous_pano_render.py #pano RGB-D pkl file
python3 scripts/demo/mp3d_continuous_video_render.py #video

# step 3: render demo audio with SoundVista
CUDA_VISIBLE_DEVICES=0 python3 tools/demo_mp3d.py --cfg configs/soundvista_mp3d.yaml --visualize-path output/ref_sampling/ model.resume_path data/pretrained_weights/soundvista_mp3d.pth

# step 4: combine audio and video for the final demo video (fps=18)
e.g.:
ffmpeg -i 'demo_files/sT4fr6TAbpF_continuous_vis.mp4' -i 'demo_files/sT4fr6TAbpF.wav' -c:v copy -c:a aac  'demo_files/sT4fr6TAbpF_output.mp4'

Citation

If you find this repository and dataset useful in your research, please consider giving a star ⭐ and cite our paper by using the following BibTeX entrys.

@inproceedings{chen2025soundvista,
  title={SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding},
  author={Chen, Mingfei and Gebru, Israel D and Ananthabhotla, Ishwarya and Richardt, Christian and Markovic, Dejan and Sandakly, Jake and Krenn, Steven and Keebler, Todd and Shlizerman, Eli and Richard, Alexander},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={8331--8341},
  year={2025}
}

License

The code and dataset are released under CC-NC 4.0 International license.

About

soundvista

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published