Real-time
Band energy
per-registerAn envelope detector traces the loudness contour of a waveform — the slow outline riding over the fast carrier inside it. Every graph on this page is drawn by the method's real algorithm, and the sliders at the top drive all of them at once.
The whole method, live
Score card
- Causality
- low-latency
- Signal model
- polyphonic
- Reads
- per-band
- Latency
- ≈1 frame
- Cost
- STFT
- Domain
- frequency
Scored qualitatively.
This method outputs a normalized contour (onset strength, per-band or perceptual loudness), not an amplitude in the units of the true envelope — so an amplitude error number would be meaningless. Its strength is the spectral axis: read the gallery below.
How it works
One contour isn't enough — track loudness per register. Split the spectrum into a few bands and follow each band's energy over time. Because a bassline and a cymbal occupy different bands, polyphony stops averaging into mush: here the low band rides the sustained notes while the high band spikes on the percussive hits.
Each band is normalized to its own peak so its shape is readable. In practice these are mel or critical bands; this is also exactly what an STFT/spectrogram gives you, one envelope per bin.
Key terms
- Frequency band
- A slice of the spectrum — say a low / mid / high split — followed independently over time. Each band carries its own envelope, so a bassline and a cymbal never average into the same contour.
- Mel / critical bands
- Perceptually-spaced bands that match how the ear groups frequency: narrow down low, wide up high. They are the usual choice in practice, since a band split that tracks hearing reads more like what you actually notice in the mix.
- Per-band normalization
- Each band scaled to its own peak so its shape stays readable regardless of absolute energy. A quiet high band and a loud low band both fill the same vertical range, so you compare their motion, not their level.
Building the envelope, step by step
One envelope can't describe a mix where a bassline and a cymbal sound at once. The fix is to stop asking for a single contour and follow energy per register instead — each graph below is drawn by the real algorithm on the page's polyphonic input.
- Step 1The raw mix
Start with the polyphonic input — several voices at once, with no single carrier to demodulate. A lone amplitude follower would just average them into mush.
- Step 2One contour per band
Split the spectrum into a few bands and take each band's energy over time, normalized to its own peak. Now the low band rides the sustained notes while the high band spikes on the percussive hits — the polyphony is legible instead of blurred.
The code
Six readable forms of the exact algorithm that draws the curves above — C, JS and Python ports, an optimized C, a fixed-coefficient version, and a user-controlled one whose parameters match the sliders.
#include <math.h>
/* A magnitude STFT is assumed available:
mag[k][m] = |X(bin k, frame m)|, 0 <= k < bins, 0 <= m < frames.
(e.g. produced by some stft(sig, mag, &frames, &bins); helper.) */
/* Per-band RMS energy, each band normalized to its own peak.
edges has nbands+1 entries: band b spans bins [edges[b], edges[b+1]).
out[b] is a frames-long contour; caller allocates out[b]. */
void bands(const double *const *mag, int frames, int bins,
const int *edges, int nbands, double **out) {
for (int b = 0; b < nbands; b++) {
double peak = 0.0;
for (int m = 0; m < frames; m++) {
double acc = 0.0;
int c = 0;
for (int k = edges[b]; k < edges[b + 1]; k++) {
acc += mag[k][m] * mag[k][m]; /* sum squared magnitudes */
c++;
}
double e = c > 0 ? sqrt(acc / c) : 0.0; /* RMS over the band */
out[b][m] = e;
if (e > peak) peak = e;
}
/* normalize this band to its own peak so its shape is readable */
double inv = 1.0 / (peak > 0.0 ? peak : 1e-9);
for (int m = 0; m < frames; m++) out[b][m] *= inv;
}
}
// A magnitude STFT is assumed available:
// stft.mag[k][m] = |X(bin k, frame m)|, with stft.M frames and stft.B bins.
// Per-band RMS energy, each band normalized to its own peak.
// edges has nbands+1 entries: band b spans bins [edges[b], edges[b+1]).
function bands(stft, edges) {
const { mag, M } = stft;
const out = [];
for (let b = 0; b < edges.length - 1; b++) {
const e = new Float64Array(M);
let peak = 0;
for (let m = 0; m < M; m++) {
let acc = 0;
let c = 0;
for (let k = edges[b]; k < edges[b + 1]; k++) {
acc += mag[k][m] * mag[k][m]; // sum squared magnitudes
c++;
}
e[m] = c > 0 ? Math.sqrt(acc / c) : 0; // RMS over the band
if (e[m] > peak) peak = e[m];
}
const inv = 1 / (peak > 0 ? peak : 1e-9); // self-normalize the band
out.push(e.map((v) => v * inv));
}
return out;
}
import numpy as np
def bands(mag, edges):
"""Per-band RMS energy, each band normalized to its own peak.
mag: magnitude STFT, shape (bins, frames) -- from some stft() helper.
edges: nbands+1 bin indices; band b spans bins [edges[b], edges[b+1]).
returns a list of length-frames contours, one per band.
"""
out = []
for lo, hi in zip(edges[:-1], edges[1:]):
band = mag[lo:hi] # the bins in this band
if band.shape[0] > 0:
e = np.sqrt(np.mean(band**2, axis=0)) # RMS per frame
else:
e = np.zeros(mag.shape[1])
peak = e.max()
out.append(e / (peak if peak > 0 else 1e-9)) # self-normalize
return out
Same output, fewer passes. The cost here is dominated by the STFT, not the grouping, so the grouping is made cheap: instead of re-scanning the spectrum per band, walk every bin once per frame and route its squared magnitude to the band it falls in (band[] index advances as k crosses the next edge). Per-band sums and peaks are tracked inline, and the final normalize is a single multiply by the precomputed reciprocal.
#include <math.h>
void bands_opt(const double *const *mag, int frames, int bins,
const int *edges, int nbands, double **out) {
double *peak = (double *)calloc(nbands, sizeof(double));
int *width = (int *)calloc(nbands, sizeof(int));
for (int b = 0; b < nbands; b++) width[b] = edges[b + 1] - edges[b];
for (int m = 0; m < frames; m++) {
int b = 0;
double acc = 0.0;
for (int k = edges[0]; k < edges[nbands]; k++) {
while (b < nbands && k >= edges[b + 1]) { /* close band b */
out[b][m] = width[b] > 0 ? sqrt(acc / width[b]) : 0.0;
if (out[b][m] > peak[b]) peak[b] = out[b][m];
acc = 0.0;
b++;
}
double v = mag[k][m];
acc += v * v; /* one pass over bins */
}
out[b][m] = width[b] > 0 ? sqrt(acc / width[b]) : 0.0;
if (out[b][m] > peak[b]) peak[b] = out[b][m];
}
for (int b = 0; b < nbands; b++) {
double inv = 1.0 / (peak[b] > 0.0 ? peak[b] : 1e-9);
for (int m = 0; m < frames; m++) out[b][m] *= inv;
}
free(peak); free(width);
}
Three bands with the page's default edges hard-coded: bins [1,5) = Low, [5,13) = Mid, [13,65) = High (65 = FRAME/2+1 for a 128-sample frame). No edge array, no band count argument — the smallest kernel for this fixed register split.
#include <math.h>
/* Default split: Low = bins 1..4, Mid = 5..12, High = 13..64. */
static const int EDGES[4] = {1, 5, 13, 65};
/* out[3][frames], each row a band; caller allocates. */
void bands_fixed(const double *const *mag, int frames, double **out) {
for (int b = 0; b < 3; b++) {
double peak = 0.0;
int width = EDGES[b + 1] - EDGES[b];
for (int m = 0; m < frames; m++) {
double acc = 0.0;
for (int k = EDGES[b]; k < EDGES[b + 1]; k++)
acc += mag[k][m] * mag[k][m];
double e = sqrt(acc / width);
out[b][m] = e;
if (e > peak) peak = e;
}
double inv = 1.0 / (peak > 0.0 ? peak : 1e-9);
for (int m = 0; m < frames; m++) out[b][m] *= inv;
}
}
The page exposes one live control, Smoothing (0-80 samp): it runs a centered (zero-lag) moving mean over each normalized band, so larger values trade time resolution for steadier contours without shifting the curve. The band edges themselves are a code-level choice, passed in here as edges[]: more, narrower bands give finer per-register contours; fewer, wider bands blur registers together. Edges are bin indices, so edge * (FS / FRAME) is the band's lower frequency in Hz.
#include <math.h>
static double mean_window(const double *x, int n, int i, int W) {
int lo = i - W / 2 < 0 ? 0 : i - W / 2;
int hi = i + W / 2 + 1 > n ? n : i + W / 2 + 1;
double s = 0.0;
for (int j = lo; j < hi; j++) s += x[j];
return s / (hi - lo);
}
void bands_ctl(const double *const *mag, int frames, int bins,
const int *edges, int nbands,
int smoothing, /* slider: 0 .. 80 samples */
double **out) {
for (int b = 0; b < nbands; b++) {
double peak = 0.0;
int width = edges[b + 1] - edges[b];
for (int m = 0; m < frames; m++) {
double acc = 0.0;
for (int k = edges[b]; k < edges[b + 1]; k++)
acc += mag[k][m] * mag[k][m];
double e = width > 0 ? sqrt(acc / width) : 0.0;
out[b][m] = e;
if (e > peak) peak = e;
}
double inv = 1.0 / (peak > 0.0 ? peak : 1e-9);
for (int m = 0; m < frames; m++) out[b][m] *= inv;
if (smoothing > 0) { /* centered moving mean */
double *tmp = (double *)malloc(frames * sizeof(double));
for (int m = 0; m < frames; m++)
tmp[m] = mean_window(out[b], frames, m, smoothing);
for (int m = 0; m < frames; m++) out[b][m] = tmp[m];
free(tmp);
}
}
}