Audio examples

The examples below demonstrate the generation of accompaniments given different contexts (rows) and references (columns). We use the DiT model which achieves best performance in the ablation experiments conducted in the paper. For listening purposes, we pan slightly the generated accompaniment to the right and the context to the left.

REF → Bass Drums Flute Guitar Harp Piano Violin
↓ CTX
Bass
Drums
Flute
Guitar
Harp
Piano
Violin

The examples below demonstrate the generation of accompaniments given different contexts and text prompts. We use the DiT + CLAPβ model which achieves best performance in the ablation experiments conducted in the paper. For listening purposes, we pan slightly the generated accompaniment to the right and the context to the left.

Audio examples with text prompts

Prompt CTX1 CTX2 CTX3

"bass, synth, electronic"

"electric guitar, distortion pedal, solo"

"percussion, bongos, latin"

"synth, pad"

"grand piano, acoustic, pop"

"violins, orchestral, ethereal"

"organ, Hammond, warm, melodic"