Real-time
Spectral flux
onset strengthAn envelope detector traces the loudness contour of a waveform — the slow outline riding over the fast carrier inside it. Every graph on this page is drawn by the method's real algorithm, and the sliders at the top drive all of them at once.
The whole method, live
Score card
- Causality
- low-latency
- Signal model
- polyphonic
- Reads
- onset strength
- Latency
- ≈1 frame
- Cost
- STFT
- Domain
- frequency
Scored qualitatively.
This method outputs a normalized contour (onset strength, per-band or perceptual loudness), not an amplitude in the units of the true envelope — so an amplitude error number would be meaningless. Its strength is the spectral axis: read the gallery below.
How it works
Where music software finds the beat. Take the STFT and sum the frame-to-frame increases in magnitude across all bins — a half-wave-rectified spectral difference. Energy rising anywhere — a new note, a drum hit — produces a peak, so it flags onsets no matter how many voices overlap.
This onset-strength envelope is the front end of nearly every beat-tracker and tempo estimator. The sensitivity control sets the log-compression, i.e. how much quiet onsets count relative to loud ones.
Key terms
- STFT
- The short-time Fourier transform — the spectrogram. It slices the signal into short overlapping frames and reports a magnitude per frequency bin per frame, so you can see how the spectrum changes over time.
- Spectral flux
- The frame-to-frame increase in magnitude, summed across all bins: Σ max(0, |X[k]| − |X_prev[k]|). It measures how much new energy appeared since the last frame.
- Half-wave rectification
- The max(0, ·) step that keeps only the rises and discards the falls — so only energy increases count toward an onset. A note ending should not look like a note starting.
- Onset strength
- The resulting curve. Its peaks mark onsets — the front of each note or drum hit — which is exactly what the front end of a beat-tracker feeds on. The sensitivity control sets the log compression: how much quiet onsets count relative to loud ones.
Building the envelope, step by step
Flux doesn't follow loudness — it follows change. Each graph below is drawn by the real algorithm on the page's polyphonic input, working up to the finished onset-strength curve.
- Step 1The raw mix
Start with the polyphonic input — several voices overlapping, with no single carrier. Amplitude alone won't tell you where the hits are: a sustained chord can be louder than the snare that lands on top of it.
- Step 2Onset strength
For each frame, compare its spectrum to the previous one, keep only the bins that got louder, and sum that rise across all bins. Steady tones contribute nothing; new energy anywhere spikes the curve. The result is a spiky onset-strength contour — one peak per attack — laid over the dimmed mix.
The code
Six readable forms of the exact algorithm that draws the curves above — C, JS and Python ports, an optimized C, a fixed-coefficient version, and a user-controlled one whose parameters match the sliders.
#include <math.h>
/* Provided by the shared DSP layer: an STFT magnitude spectrogram.
mag is [B][M] — B = FRAME/2+1 bins, M frames; a Hann window and the
FFT live inside it. We only write the flux core here.
void stft(const double *x, int n, double **mag, int *B, int *M);
void norm_max(double *a, int m); // divide by peak
void up_frames(const double *fr, int m, double *env, int n); // -> sample rate */
/* Spectral flux: for each frame, sum over bins the positive (half-wave-
rectified) increase in log-compressed magnitude from the previous frame.
gamma sets the log compression — how much quiet onsets count. */
void spectral_flux(double **mag, int B, int M, double gamma, double *flux) {
flux[0] = 0.0;
for (int m = 1; m < M; m++) {
double s = 0.0;
for (int k = 0; k < B; k++) {
double d = log1p(gamma * mag[k][m]) - log1p(gamma * mag[k][m - 1]);
if (d > 0.0) s += d; /* half-wave rectify: rises only */
}
flux[m] = s;
}
norm_max(flux, M); /* normalize onset strength to peak */
}
// stft(sig) -> { mag, centers, M, B } and the normMax / upFrames helpers
// come from the shared DSP layer (Hann window + FFT live inside stft).
// mag is mag[bin][frame]; B = FRAME/2+1 bins, M frames.
// Spectral flux: per frame, sum the positive (half-wave-rectified) increase
// in log-compressed magnitude across all bins. gamma sets the compression.
function spectralFlux(S, gamma) {
const { mag, M, B, centers } = S;
const flux = new Float64Array(M);
for (let m = 1; m < M; m++) {
let s = 0;
for (let k = 0; k < B; k++) {
const d =
Math.log1p(gamma * mag[k][m]) - Math.log1p(gamma * mag[k][m - 1]);
if (d > 0) s += d; // half-wave rectify: count rises only
}
flux[m] = s;
}
return upFrames(normMax(flux), centers); // normalize, then back to sample rate
}
import numpy as np
def spectral_flux(sig, gamma, n_fft=128, hop=16):
"""Onset-strength via half-wave-rectified, log-compressed spectral flux.
A librosa-style reference: magnitude STFT, log-compress, diff across
frames, clip negatives at 0, and sum each frame across bins.
"""
# |STFT| -> shape (bins, frames). (librosa: np.abs(librosa.stft(...)))
win = 0.5 - 0.5 * np.cos(2 * np.pi * np.arange(n_fft) / n_fft)
frames = np.array(
[sig[i:i + n_fft] * win
for i in range(0, len(sig) - n_fft + 1, hop)]
)
mag = np.abs(np.fft.rfft(frames, axis=1)).T # (bins, frames)
comp = np.log1p(gamma * mag) # log compression
diff = np.diff(comp, axis=1) # frame-to-frame change
flux = np.clip(diff, 0, None).sum(axis=0) # half-wave rectify, sum bins
flux = np.insert(flux, 0, 0.0) # frame 0 has no predecessor
peak = flux.max() or 1e-9
return flux / peak # normalized onset strength
The cost here is dominated by the STFT, not the flux sum — so the win is in memory, not arithmetic. Instead of holding the whole [B][M] spectrogram, stream one frame at a time and keep a single previous-frame log-magnitude buffer (prev[B]). We also fold the log1p into the buffer once per frame instead of recomputing the previous frame's compression on every step. O(B) memory, each magnitude compressed exactly once.
#include <math.h>
/* next_frame() fills mag[B] with the magnitude spectrum of the next STFT
frame and returns 0 when the signal is exhausted — a streaming STFT.
int next_frame(double *mag, int B); */
void spectral_flux_stream(double gamma, double *flux, int max_frames, int B) {
double prev[B]; /* previous frame, log-compressed */
double mag[B];
int m = 0;
if (!next_frame(mag, B)) return; /* prime prev[] with frame 0 */
for (int k = 0; k < B; k++) prev[k] = log1p(gamma * mag[k]);
flux[m++] = 0.0;
while (m < max_frames && next_frame(mag, B)) {
double s = 0.0;
for (int k = 0; k < B; k++) {
double c = log1p(gamma * mag[k]); /* compress once */
double d = c - prev[k];
if (d > 0.0) s += d;
prev[k] = c; /* roll forward */
}
flux[m++] = s;
}
/* caller normalizes (divide by peak) and upsamples to the sample rate */
}
Sensitivity hard-coded to gamma = 60 and Smoothing fixed at 8 samples (the page defaults). The log-compression coefficient is a baked-in literal and the post-smoothing window is a constant; no tuning knobs.
#include <math.h>
/* mag is [B][M] from the shared STFT. Sensitivity is fixed at gamma = 60,
smoothing fixed at 8 samples. */
void spectral_flux_fixed(double **mag, int B, int M, double *flux) {
flux[0] = 0.0;
for (int m = 1; m < M; m++) {
double s = 0.0;
for (int k = 0; k < B; k++) {
double d = log1p(60.0 * mag[k][m]) - log1p(60.0 * mag[k][m - 1]);
if (d > 0.0) s += d;
}
flux[m] = s;
}
norm_max(flux, M); /* normalize */
centered_mean(flux, M, 8); /* round off the peaks (8 samples) */
norm_max(flux, M); /* re-normalize to peak */
}
The two page sliders map straight in. Sensitivity is the log-compression strength gamma (4 .. 240): higher gamma lifts quiet onsets relative to loud ones, so soft hits register and the strength curve fills in. Smoothing is a centered moving average over the flux, in SAMPLES (0 .. 60, 0 = off): widening it rounds off the spiky onset peaks into a slower strength contour. Both run after the same half-wave-rectified spectral difference.
#include <math.h>
/* mag[B][M] from the shared STFT. norm_max() and centered_mean() (a zero-lag
moving average, window in samples) come from the shared DSP layer. */
void spectral_flux_ctl(double **mag, int B, int M, double *flux,
double sensitivity, /* slider: 4 .. 240 (gamma) */
int smoothing_samp) { /* slider: 0 .. 60 samples */
flux[0] = 0.0;
for (int m = 1; m < M; m++) {
double s = 0.0;
for (int k = 0; k < B; k++) {
double d = log1p(sensitivity * mag[k][m])
- log1p(sensitivity * mag[k][m - 1]);
if (d > 0.0) s += d; /* half-wave rectify */
}
flux[m] = s;
}
norm_max(flux, M);
if (smoothing_samp > 0) {
centered_mean(flux, M, smoothing_samp); /* round off the peaks */
norm_max(flux, M); /* re-normalize to peak */
}
}