Real-time

Spectral flux

onset strength

frequency-domainlow-latencypolyphoniconset strength

An envelope detector traces the loudness contour of a waveform — the slow outline riding over the fast carrier inside it. Every graph on this page is drawn by the method's real algorithm, and the sliders at the top drive all of them at once.

The whole method, live

Spectral flux

onset strengthpolyphonic

Spectral flux

Sensitivity60 γ

Smoothing8 samp (0.2 ms)

Score card

Causality: low-latency
Signal model: polyphonic
Reads: onset strength
Latency: ≈1 frame
Cost: STFT
Domain: frequency

Scored qualitatively.

This method outputs a normalized contour (onset strength, per-band or perceptual loudness), not an amplitude in the units of the true envelope — so an amplitude error number would be meaningless. Its strength is the spectral axis: read the gallery below.

How it works

Where music software finds the beat. Take the STFT and sum the frame-to-frame increases in magnitude across all bins — a half-wave-rectified spectral difference. Energy rising anywhere — a new note, a drum hit — produces a peak, so it flags onsets no matter how many voices overlap.

This onset-strength envelope is the front end of nearly every beat-tracker and tempo estimator. The sensitivity control sets the log-compression, i.e. how much quiet onsets count relative to loud ones.

Key terms

STFT: The short-time Fourier transform — the spectrogram. It slices the signal into short overlapping frames and reports a magnitude per frequency bin per frame, so you can see how the spectrum changes over time.
Spectral flux: The frame-to-frame increase in magnitude, summed across all bins: Σ max(0, |X[k]| − |X_prev[k]|). It measures how much new energy appeared since the last frame.
Half-wave rectification: The max(0, ·) step that keeps only the rises and discards the falls — so only energy increases count toward an onset. A note ending should not look like a note starting.
Onset strength: The resulting curve. Its peaks mark onsets — the front of each note or drum hit — which is exactly what the front end of a beat-tracker feeds on. The sensitivity control sets the log compression: how much quiet onsets count relative to loud ones.

Building the envelope, step by step

Flux doesn't follow loudness — it follows change. Each graph below is drawn by the real algorithm on the page's polyphonic input, working up to the finished onset-strength curve.

Step 1The raw mix
Start with the polyphonic input — several voices overlapping, with no single carrier. Amplitude alone won't tell you where the hits are: a sustained chord can be louder than the snare that lands on top of it.
Step 2Onset strength
For each frame, compare its spectrum to the previous one, keep only the bins that got louder, and sum that rise across all bins. Steady tones contribute nothing; new energy anywhere spikes the curve. The result is a spiky onset-strength contour — one peak per attack — laid over the dimmed mix.

The code

Six readable forms of the exact algorithm that draws the curves above — C, JS and Python ports, an optimized C, a fixed-coefficient version, and a user-controlled one whose parameters match the sliders.

#include <math.h>

/* Provided by the shared DSP layer: an STFT magnitude spectrogram.
   mag is [B][M] — B = FRAME/2+1 bins, M frames; a Hann window and the
   FFT live inside it. We only write the flux core here.
     void   stft(const double *x, int n, double **mag, int *B, int *M);
     void   norm_max(double *a, int m);                  // divide by peak
     void   up_frames(const double *fr, int m, double *env, int n); // -> sample rate */

/* Spectral flux: for each frame, sum over bins the positive (half-wave-
   rectified) increase in log-compressed magnitude from the previous frame.
   gamma sets the log compression — how much quiet onsets count. */
void spectral_flux(double **mag, int B, int M, double gamma, double *flux) {
    flux[0] = 0.0;
    for (int m = 1; m < M; m++) {
        double s = 0.0;
        for (int k = 0; k < B; k++) {
            double d = log1p(gamma * mag[k][m]) - log1p(gamma * mag[k][m - 1]);
            if (d > 0.0) s += d;          /* half-wave rectify: rises only */
        }
        flux[m] = s;
    }
    norm_max(flux, M);                    /* normalize onset strength to peak */
}

// stft(sig) -> { mag, centers, M, B } and the normMax / upFrames helpers
// come from the shared DSP layer (Hann window + FFT live inside stft).
// mag is mag[bin][frame]; B = FRAME/2+1 bins, M frames.

// Spectral flux: per frame, sum the positive (half-wave-rectified) increase
// in log-compressed magnitude across all bins. gamma sets the compression.
function spectralFlux(S, gamma) {
  const { mag, M, B, centers } = S;
  const flux = new Float64Array(M);
  for (let m = 1; m < M; m++) {
    let s = 0;
    for (let k = 0; k < B; k++) {
      const d =
        Math.log1p(gamma * mag[k][m]) - Math.log1p(gamma * mag[k][m - 1]);
      if (d > 0) s += d; // half-wave rectify: count rises only
    }
    flux[m] = s;
  }
  return upFrames(normMax(flux), centers); // normalize, then back to sample rate
}

import numpy as np

def spectral_flux(sig, gamma, n_fft=128, hop=16):
    """Onset-strength via half-wave-rectified, log-compressed spectral flux.

    A librosa-style reference: magnitude STFT, log-compress, diff across
    frames, clip negatives at 0, and sum each frame across bins.
    """
    # |STFT| -> shape (bins, frames). (librosa: np.abs(librosa.stft(...)))
    win = 0.5 - 0.5 * np.cos(2 * np.pi * np.arange(n_fft) / n_fft)
    frames = np.array(
        [sig[i:i + n_fft] * win
         for i in range(0, len(sig) - n_fft + 1, hop)]
    )
    mag = np.abs(np.fft.rfft(frames, axis=1)).T      # (bins, frames)

    comp = np.log1p(gamma * mag)                     # log compression
    diff = np.diff(comp, axis=1)                     # frame-to-frame change
    flux = np.clip(diff, 0, None).sum(axis=0)        # half-wave rectify, sum bins
    flux = np.insert(flux, 0, 0.0)                   # frame 0 has no predecessor

    peak = flux.max() or 1e-9
    return flux / peak                               # normalized onset strength

The cost here is dominated by the STFT, not the flux sum — so the win is in memory, not arithmetic. Instead of holding the whole [B][M] spectrogram, stream one frame at a time and keep a single previous-frame log-magnitude buffer (prev[B]). We also fold the log1p into the buffer once per frame instead of recomputing the previous frame's compression on every step. O(B) memory, each magnitude compressed exactly once.

#include <math.h>

/* next_frame() fills mag[B] with the magnitude spectrum of the next STFT
   frame and returns 0 when the signal is exhausted — a streaming STFT.
     int next_frame(double *mag, int B);   */

void spectral_flux_stream(double gamma, double *flux, int max_frames, int B) {
    double prev[B];                       /* previous frame, log-compressed */
    double mag[B];
    int m = 0;

    if (!next_frame(mag, B)) return;      /* prime prev[] with frame 0 */
    for (int k = 0; k < B; k++) prev[k] = log1p(gamma * mag[k]);
    flux[m++] = 0.0;

    while (m < max_frames && next_frame(mag, B)) {
        double s = 0.0;
        for (int k = 0; k < B; k++) {
            double c = log1p(gamma * mag[k]); /* compress once */
            double d = c - prev[k];
            if (d > 0.0) s += d;
            prev[k] = c;                      /* roll forward */
        }
        flux[m++] = s;
    }
    /* caller normalizes (divide by peak) and upsamples to the sample rate */
}

Sensitivity hard-coded to gamma = 60 and Smoothing fixed at 8 samples (the page defaults). The log-compression coefficient is a baked-in literal and the post-smoothing window is a constant; no tuning knobs.

#include <math.h>

/* mag is [B][M] from the shared STFT. Sensitivity is fixed at gamma = 60,
   smoothing fixed at 8 samples. */
void spectral_flux_fixed(double **mag, int B, int M, double *flux) {
    flux[0] = 0.0;
    for (int m = 1; m < M; m++) {
        double s = 0.0;
        for (int k = 0; k < B; k++) {
            double d = log1p(60.0 * mag[k][m]) - log1p(60.0 * mag[k][m - 1]);
            if (d > 0.0) s += d;
        }
        flux[m] = s;
    }
    norm_max(flux, M);              /* normalize */
    centered_mean(flux, M, 8);      /* round off the peaks (8 samples) */
    norm_max(flux, M);              /* re-normalize to peak */
}

The two page sliders map straight in. Sensitivity is the log-compression strength gamma (4 .. 240): higher gamma lifts quiet onsets relative to loud ones, so soft hits register and the strength curve fills in. Smoothing is a centered moving average over the flux, in SAMPLES (0 .. 60, 0 = off): widening it rounds off the spiky onset peaks into a slower strength contour. Both run after the same half-wave-rectified spectral difference.

#include <math.h>

/* mag[B][M] from the shared STFT. norm_max() and centered_mean() (a zero-lag
   moving average, window in samples) come from the shared DSP layer. */
void spectral_flux_ctl(double **mag, int B, int M, double *flux,
                       double sensitivity,   /* slider: 4 .. 240 (gamma) */
                       int smoothing_samp) {  /* slider: 0 .. 60 samples  */
    flux[0] = 0.0;
    for (int m = 1; m < M; m++) {
        double s = 0.0;
        for (int k = 0; k < B; k++) {
            double d = log1p(sensitivity * mag[k][m])
                     - log1p(sensitivity * mag[k][m - 1]);
            if (d > 0.0) s += d;   /* half-wave rectify */
        }
        flux[m] = s;
    }
    norm_max(flux, M);
    if (smoothing_samp > 0) {
        centered_mean(flux, M, smoothing_samp);  /* round off the peaks */
        norm_max(flux, M);                        /* re-normalize to peak */
    }
}

Generators

21 stress-test signals · detector vs. true envelope

Level// const-sine

Temporal// gate

Temporal// burst

Temporal// triangle

Temporal// tremolo-sweep

Robust// noisy

Robust// dc-offset

Robust// clipped

Robust// square

Robust// chirp

Robust// low-carrier

Spectral// beat

Spectral// vibrato

Spectral// poly-perc

Spectral// am-fade

Boundary// zeros

Boundary// dc-const

Boundary// impulse

Boundary// nyquist

Boundary// no-margins

Boundary// am-staircase