ViSTA: Visual Storytelling using Multi-modal Adapters for Text-to-Image Diffusion Models

Sibo Dong, Ismail Shaheen, Maggie Shen, Rupayan Mallick, Sarah Adel Bargal. ViSTA: Visual Storytelling using Multi-modal Adapters for Text-to-Image Diffusion Models. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2026, Tucson, AZ, USA, March 6-10, 2026. pages 12-21, IEEE, 2026. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.