Kontaktperson
Olof Mogren
Senior Researcher
Kontakta OlofPå RISE Learning Machines Seminar den 20 mars 2025, ger Oriol Nieto, Adobe, sin presentation: GenAI for sound design. Seminariet är på engelska.
This presentation explores the forefront of generative AI research for sound design at Adobe Research. I will provide an overview of Latent Diffusion Models, which form the foundation of our work, and introduce several recent advancements focused on controllability and multimodality.
I will begin with SILA [1], a technique designed to enhance the control of sound effects generated through text prompts. Following this, I will present Sketch2Sound [2], a model that generates sound effects conditioned on both audio recordings and text. Lastly, I will examine MultiFoley [3], a model capable of generating sound effects from both silent videos and text.
Throughout the talk, I will showcase a series of examples and demos to illustrate the practical applications and potential of these models, making the case that we are only beginning to unveil a completely new paradigm in how to approach sound design.
[1] Sonal Kumar, Prem Seetharaman, Justin Salamon, Dinesh Manocha, Oriol Nieto, "SILA: Signal-to-Language Augmentation for Enhanced Control in Text-to-Audio Generation", In review for IEEE SPL
[2] Hugo Flores García, Oriol Nieto, Justin Salamon, Bryan Pardo, Prem Seetharaman, "Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations", ICASSP 2025
[3] Ziyang Chen, Prem Seetharaman, Bryan Russell, Oriol Nieto, David Bourgin, Andrew Owens, Justin Salamon, "Video-Guided Foley Sound Generation with Multimodal Controls", In review for CVPR 2025
Oriol is a Senior Research Engineer at Adobe Research, where he focuses on human-centered AI for audio creativity, encompassing everything from music to audiobooks, video editing, and sound design. He holds a PhD in Music Technology from MARL, NYU, a Master's in Music, Science, and Technology from Stanford University, and a Master's in Information Technologies from Pompeu Fabra University.
Highly involved with the Music Information Retrieval community, he was one of the three General Chairs for ISMIR 2024 in San Francisco this past November. Oriol has helped develop relevant open-source MIR packages such as librosa, mir-eval, and MSAF; contributed to PyTorch; and plays guitar, violin, cajón, and sings (and screams) in his spare time.