LiveBand: Live Accompaniment Generation in the Audio Domain
Marco Pasini2, Javier Nistal1, Mathias Rose Bjare3, Stefan Lattner1, George Fazekas2
- Sony Computer Science Laboratories Paris
- Queen Mary University of London
- Johannes Kepler University Linz
Paper: arXiv
Abstract
We present LiveBand, a real-time system that generates high-fidelity music accompaniments to live audio input, respecting strict causal constraints. Our method trains a causal transformer generator in the continuous latent space of a pre-trained causal audio autoencoder, using adversarial sequence-level supervision from a discriminator. At each timestep, the generator receives only the causally available mix context and Gaussian noise, and predicts accompaniment latents without access to future mix frames or ground-truth target latents. Training is performed in a single parallel forward pass under causal masking, while streaming inference proceeds autoregressively with a rolling attention state. The model’s training and inference computations are matched by design, eliminating teacher forcing and the associated exposure bias. On a multi-instrument music accompaniment benchmark, LiveBand improves over prior work on objective measures of audio quality, beat alignment, and mix adherence, while enabling real-time streaming generation without lookahead into the future on consumer hardware.
Framework

Train-Test Equivalence

Audio Examples
We highly recommend listening with headphones. All examples use Slakh2100 test samples.
Showcase: Internal LiveBand (0.1s)
We showcase some examples generated by the internal-data LiveBand model. Note how the model produces accompaniments that are initially incoherent (the model already starts generating with absolutely no context about the incoming mix), and then quickly adapts once it has enough context about the mix.
| Sample | Input Mix | Generated Stem | Mix (L=Input, R=Gen) |
|---|---|---|---|
| 01 | |||
| 02 | |||
| 03 | |||
| 04 | |||
| 05 | |||
| 06 | |||
| 07 | |||
| 08 | |||
| 09 | |||
| 10 | |||
| 11 | |||
| 12 | |||
| 13 | |||
| 14 | |||
| 15 | |||
| 16 | |||
| 17 | |||
| 18 | |||
| 19 | |||
| 20 | |||
| 21 | |||
| 22 | |||
| 23 | |||
| 24 | |||
| 25 |
LiveBand Family Comparison
We compare the internal LiveBand model (0.1s anticipation) with Slakh2100-trained models at 0s, 0.1s, 1s, and the bidirectional variant. The bidirectional model outputs 10s, while all other samples are 20s.
| Sample 01 | ||||||
|---|---|---|---|---|---|---|
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 02 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 03 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 04 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 05 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 06 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 07 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 08 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 09 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 10 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 11 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 12 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 13 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 14 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 15 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 16 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 17 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 18 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 19 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 20 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 21 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 22 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 23 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 24 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 25 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
| Sample 26 | ||||||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | Internal (0.1s) Gen | Internal Mix | LiveBand 0s Gen | LiveBand 0s Mix |
| LiveBand 0.1s Gen | LiveBand 0.1s Mix | LiveBand 1s Gen | LiveBand 1s Mix | LiveBand Bidir Gen | LiveBand Bidir Mix | |
StreamMusicGen Baselines
We show StreamMusicGen baselines with 0s and 1s anticipation. Note how the mix-adherence degrades over time because of compounding errors.
| Sample 01 | |||
|---|---|---|---|
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 02 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 03 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 04 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 05 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 06 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 07 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 08 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 09 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 10 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 11 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 12 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 13 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 14 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 15 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 16 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 17 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 18 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |
| Sample 19 | |||
| Input Mix | GT Stem | GT Mix (L=Input, R=GT) | StreamMusicGen 0s Gen |
| StreamMusicGen 0s Mix | StreamMusicGen 1s Gen | StreamMusicGen 1s Mix | |