Skip to content

sadda.dsp

Pure-function DSP toolkit. Every function takes NumPy float32 audio + a sample rate and returns NumPy or dataclass results. No corpus dependency. STABLE tier.

sadda.dsp — foundational DSP toolkit.

Pure-function API over NumPy float32 arrays. Window functions, STFT, spectrogram, intensity, and the relocated f0 from Phase 0 all live here. Stability tier: STABLE (per the 2026-05-18 Python API surface DEVLOG entry).

The top-level sadda.f0 stays as a Phase-0 back-compat alias for the same function.

FormantFrame

One frame of formant output. Variable-length frequencies / bandwidths per frame — frames where the LPC root-finder didn't return enough valid roots in the formant range are honestly empty rather than NaN-padded.

__doc__ class-attribute

__doc__ = "One frame of formant output. Variable-length `frequencies` /\n`bandwidths` per frame — frames where the LPC root-finder didn't return\nenough valid roots in the formant range are honestly empty rather than\nNaN-padded."

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__module__ class-attribute

__module__ = 'sadda._native'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__sadda_stability__ class-attribute

__sadda_stability__ = 'stable'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

bandwidths property

bandwidths

Bandwidths in Hz, co-indexed with frequencies.

frequencies property

frequencies

Formant frequencies in Hz, ascending.

time_seconds property

time_seconds

Time at the centre of the analysis frame, in seconds.

__repr__ method descriptor

__repr__()

Return repr(self).

hann builtin

hann(n)

Hann window: 0.5 * (1 - cos(2π n / (N-1))).

hamming builtin

hamming(n)

Hamming window: 0.54 - 0.46 * cos(2π n / (N-1)).

blackman builtin

blackman(n)

Blackman window: 0.42 - 0.5*cos(2π n / (N-1)) + 0.08*cos(4π n / (N-1)).

gaussian builtin

gaussian(n, sigma)

Gaussian window of length n with standard deviation sigma (in samples).

kaiser builtin

kaiser(n, beta)

Kaiser window of length n with shape parameter beta.

stft builtin

stft(samples, frame_size, hop_size, *, window=None)

Short-time Fourier transform of a real-valued 1-D float32 signal.

Returns the complex spectrogram with shape (n_frames, n_freq_bins) where n_freq_bins = frame_size / 2 + 1 (the unique half of the spectrum for real input). If window is omitted, a Hann window of length frame_size is used (matches scipy.signal.stft's default).

spectrogram builtin

spectrogram(samples, frame_size, hop_size, *, window=None)

Power spectrogram of a real-valued signal: |X|² of the STFT, in shape (n_freq_bins, n_frames). If window is omitted, a Hann window of length frame_size is used.

intensity builtin

intensity(audio, *, frame_size_seconds=0.03, hop_seconds=0.01)

Per-frame intensity over an [Audio]: returns (times, rms, db_fs) as three NumPy arrays. times is float64 seconds at frame centres; rms is float32 linear amplitude; db_fs is float32 dB relative to digital full-scale (clamped to -200 dB on silence). dB-SPL (Praat convention) arrives in a later slice once microphone calibration is wired through.

f0 builtin

f0(audio, *, frame_size_seconds=0.03, hop_size_seconds=0.01, min_freq_hz=75.0, max_freq_hz=500.0)

Estimates f0 over an Audio via time-domain autocorrelation.

Returns (times, frequencies) as a 2-tuple of NumPy arrays: times is float64 in seconds, frequencies is float32 in Hz.

voiced_pitch builtin

voiced_pitch(audio, *, frame_size_seconds=0.03, hop_size_seconds=0.01, min_freq_hz=75.0, max_freq_hz=500.0, method='boersma', voicing_threshold=0.45)

Estimates f0 with a voicing decision and returns (times, frequencies, voicing) as three NumPy arrays. times is float64 seconds at frame centres; frequencies is float32 Hz; voicing is float32 in [0, 1].

method selects the pitch tracker. Two algorithmic families — autocorrelation and cumulative-mean-normalized-difference — covering both Praat-faithful and librosa-faithful expectations:

Autocorrelation family: - "boersma" (default) — faithful Boersma 1993 / Praat Sound: To Pitch (ac)… with very_accurate = false. Multi-candidate per-frame detection + Viterbi path-finder with octave-cost / octave-jump-cost / voiced-unvoiced-cost terms. Robust to halving / doubling / transient errors; Praat-validated. The default because it does not latch onto subharmonics of clean tones the way the simpler trackers below do (e.g. 150→75, 250→83.3). - "windowed_autocorrelation" — adopts Boersma 1993's window-correction idea (divides windowed-signal autocorrelation by window autocorrelation); fast single-peak tracker, but prone to subharmonic / octave-down errors (no octave cost or path-finding). - "autocorrelation" — naive time-domain autocorrelation (Phase-0 tracker; what sadda.dsp.f0(...) calls).

Cumulative-mean-normalized-difference family: - "yin" — de Cheveigné & Kawahara 2002. Difference function + CMNDF + absolute threshold. Simple and fast; independent algorithmic family from autocorrelation, useful for cross-validation against "boersma". - "pyin" — Mauch & Dixon 2014, librosa's default. Probabilistic YIN with a beta-prior distribution over thresholds plus an HMM smoothing pass. librosa-validated. - "swipe" — Camacho & Harris 2008 SWIPE' (prime variant). Spectral method (a third algorithmic family): matches the sqrt-loudness ERB-scale spectrum against prime-harmonic cosine kernels. Validated against the author's own MATLAB run under Octave.

voicing_threshold is informational here: the function returns voicing values for every frame so callers can apply their own threshold.

formants builtin

formants(audio, *, frame_size_seconds=0.025, hop_seconds=0.01, n_formants=5, pre_emphasis=0.97, lpc_order=None, method='burg', max_bandwidth_hz=1000.0, min_frequency_hz=50.0)

Computes per-frame formants over an [Audio] via LPC + polynomial root-finding. Returns a list of FormantFrames; each frame has variable-length frequencies / bandwidths (honestly empty for frames where the root-finder didn't return enough valid roots).

method selects the LPC estimator: "burg" (default; Praat convention) or "autocorrelation". n_formants is the maximum kept per frame after filtering; lpc_order = 2 · n_formants + 2 by default.

mfcc builtin

mfcc(audio, *, frame_size_seconds=0.025, hop_seconds=0.01, n_mels=40, n_mfcc=13, f_min=0.0, f_max=None)

Computes Mel-Frequency Cepstral Coefficients over an [Audio]. Returns a 2-D float32 NumPy array of shape (n_frames, n_mfcc), frames-first.

Defaults match librosa.feature.mfcc: Slaney mel scale, n_mels=40, n_mfcc=13, f_min=0, f_max=sr/2, 25 ms frame, 10 ms hop.

log_mel_whisper builtin

log_mel_whisper(audio, *, n_fft=400, hop_length=160, n_mels=80, target_frames=None)

Whisper-exact log-mel spectrogram, shape (n_frames, n_mels).

Byte-faithful to OpenAI Whisper's encoder front end (Slaney mel, power STFT with a periodic Hann window, log10 + clamp, global dynamic-range floor, (+4)/4 normalisation). Expects 16 kHz mono for Whisper fidelity. target_frames pads/trims the audio so the result has exactly that many frames (Whisper uses 3000 for 30 s); None keeps the natural length.