Riffusion

Generate music from text prompts using spectrogram-based diffusion.

Introduction

Riffusion is an open-source AI music generation platform that transforms natural-language prompts into audio clips by generating spectrogram images with a fine-tuned Stable Diffusion model. Initially released in December 2022, it introduced a novel workflow of text-to-spectrogram-to-audio conversion, democratizing experimental sound design for musicians, educators, and hobbyists.

Visit Riffusion Learn More

Key Features

Text-to-Spectrogram: Generate spectrograms from text prompts

Spectrogram-to-Audio: Inverse Fourier transform to produce sound

Seed & Remix Control: Deterministic outputs and section interpolation

Stem Download: Export individual instrument stems for DAW editing

Genre Versatility: From lo-fi chill to industrial techno

Free & Web-Based: Unlimited, no-login generation under MIT License

Use Case & Target Audience

Use Cases

Rapid prototyping of melodies and ambient textures
Educational demos on AI and sound wave representation
Game and VR prototypes requiring royalty-free loops
Social media backing tracks tailored by prompt
Experimental sound art and generative composition

Target Audience

Music producers and electronic artists
Audio engineers and sound designers
Educators in music technology and AI
Indie game and VR developers
Content creators needing custom, free music

What It Does?

Riffusion interprets your textual description of mood, genre, and instrumentation to generate a visual spectrogram. It then applies an inverse Fourier transform to convert that spectrogram into a playable audio clip, providing a few-second musical loop that can be further remixed or extended.

How It Works?

1. Prompt Entry: Users type a description such as "funky disco groove".
2. Spectrogram Generation: The fine-tuned Stable Diffusion model creates an image representing sound frequencies over time.
3. Audio Reconstruction: An inverse Fourier transform converts the image into a raw audio waveform.
4. Remixing & Interpolation: Use seed values or img2img to blend or extend loops.
5. Export: Download full loops or individual stems for DAWs.

Pros and Cons

Pros

Entirely free with unlimited generations
Open-source under MIT License
Fine-grained remix and seed reproducibility
Direct stem export for professional editing
Supports a wide range of musical styles

Cons

Clip lengths currently limited to a few seconds
Occasional artifacts in audio reconstruction
No built-in long-form composition support
Requires manual stitching for extended tracks

Final Thoughts

Riffusion innovates by repurposing image-based diffusion for audio generation, offering a unique, cost-free tool for rapid musical ideas. While loop lengths are brief, its open-source nature and stem export capabilities make it a potent resource for creators exploring generative soundscapes.

Visit Riffusion