liveband-companion

Companion website for the LiveBand paper.

LiveBand: Live Accompaniment Generation in the Audio Domain

Marco Pasini2, Javier Nistal1, Mathias Rose Bjare3, Stefan Lattner1, George Fazekas2

  1. Sony Computer Science Laboratories Paris
  2. Queen Mary University of London
  3. Johannes Kepler University Linz

Paper: arXiv

Abstract
We present LiveBand, a real-time system that generates high-fidelity music accompaniments to live audio input, respecting strict causal constraints. Our method trains a causal transformer generator in the continuous latent space of a pre-trained causal audio autoencoder, using adversarial sequence-level supervision from a discriminator. At each timestep, the generator receives only the causally available mix context and Gaussian noise, and predicts accompaniment latents without access to future mix frames or ground-truth target latents. Training is performed in a single parallel forward pass under causal masking, while streaming inference proceeds autoregressively with a rolling attention state. The model’s training and inference computations are matched by design, eliminating teacher forcing and the associated exposure bias. On a multi-instrument music accompaniment benchmark, LiveBand improves over prior work on objective measures of audio quality, beat alignment, and mix adherence, while enabling real-time streaming generation without lookahead into the future on consumer hardware.

Framework

Train-Test Equivalence

Audio Examples

We highly recommend listening with headphones. All examples use Slakh2100 test samples.

Showcase: Internal LiveBand (0.1s)

We showcase some examples generated by the internal-data LiveBand model. Note how the model produces accompaniments that are initially incoherent (the model already starts generating with absolutely no context about the incoming mix), and then quickly adapts once it has enough context about the mix.

Sample Input Mix Generated Stem Mix (L=Input, R=Gen)
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

LiveBand Family Comparison

We compare the internal LiveBand model (0.1s anticipation) with Slakh2100-trained models at 0s, 0.1s, 1s, and the bidirectional variant. The bidirectional model outputs 10s, while all other samples are 20s.

Sample 01
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 02
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 03
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 04
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 05
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 06
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 07
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 08
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 09
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 10
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 11
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 12
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 13
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 14
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 15
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 16
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 17
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 18
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 19
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 20
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 21
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 22
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 23
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 24
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 25
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix
Sample 26
Input Mix GT Stem GT Mix (L=Input, R=GT) Internal (0.1s) Gen Internal Mix LiveBand 0s Gen LiveBand 0s Mix
LiveBand 0.1s Gen LiveBand 0.1s Mix LiveBand 1s Gen LiveBand 1s Mix LiveBand Bidir Gen LiveBand Bidir Mix

StreamMusicGen Baselines

We show StreamMusicGen baselines with 0s and 1s anticipation. Note how the mix-adherence degrades over time because of compounding errors.

Sample 01
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 02
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 03
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 04
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 05
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 06
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 07
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 08
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 09
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 10
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 11
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 12
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 13
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 14
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 15
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 16
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 17
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 18
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix
Sample 19
Input Mix GT Stem GT Mix (L=Input, R=GT) StreamMusicGen 0s Gen
StreamMusicGen 0s Mix StreamMusicGen 1s Gen StreamMusicGen 1s Mix