sadda.dsp¶
Pure-function DSP toolkit. Every function takes NumPy float32 audio
+ a sample rate and returns NumPy or dataclass results. No corpus
dependency. STABLE tier.
sadda.dsp — foundational DSP toolkit.
Pure-function API over NumPy float32 arrays. Window functions, STFT,
spectrogram, intensity, and the relocated f0 from Phase 0 all live here.
Stability tier: STABLE (per the 2026-05-18 Python API surface DEVLOG entry).
The top-level sadda.f0 stays as a Phase-0 back-compat alias for the same
function.
Source: python/sadda/dsp/__init__.py:1
FormantFrame ¶
One frame of formant output. Variable-length frequencies /
bandwidths per frame — frames where the LPC root-finder didn't return
enough valid roots in the formant range are honestly empty rather than
NaN-padded.
Source: crates/python/src/lib.rs:3618
__doc__
class-attribute
¶
__doc__ = "One frame of formant output. Variable-length `frequencies` /\n`bandwidths` per frame — frames where the LPC root-finder didn't return\nenough valid roots in the formant range are honestly empty rather than\nNaN-padded."
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
__module__
class-attribute
¶
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
__sadda_stability__
class-attribute
¶
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
hann
builtin
¶
Hann window: 0.5 * (1 - cos(2π n / (N-1))).
Source: crates/python/src/lib.rs:3441
hamming
builtin
¶
Hamming window: 0.54 - 0.46 * cos(2π n / (N-1)).
Source: crates/python/src/lib.rs:3448
blackman
builtin
¶
Blackman window:
0.42 - 0.5*cos(2π n / (N-1)) + 0.08*cos(4π n / (N-1)).
Source: crates/python/src/lib.rs:3456
gaussian
builtin
¶
Gaussian window of length n with standard deviation sigma (in samples).
Source: crates/python/src/lib.rs:3463
kaiser
builtin
¶
Kaiser window of length n with shape parameter beta.
Source: crates/python/src/lib.rs:3470
stft
builtin
¶
Short-time Fourier transform of a real-valued 1-D float32 signal.
Returns the complex spectrogram with shape (n_frames, n_freq_bins) where
n_freq_bins = frame_size / 2 + 1 (the unique half of the spectrum for
real input). If window is omitted, a Hann window of length frame_size
is used (matches scipy.signal.stft's default).
Source: crates/python/src/lib.rs:3483
spectrogram
builtin
¶
Power spectrogram of a real-valued signal: |X|² of the STFT, in shape
(n_freq_bins, n_frames). If window is omitted, a Hann window of
length frame_size is used.
Source: crates/python/src/lib.rs:3533
intensity
builtin
¶
Per-frame intensity over an [Audio]: returns (times, rms, db_fs) as
three NumPy arrays. times is float64 seconds at frame centres; rms is
float32 linear amplitude; db_fs is float32 dB relative to digital
full-scale (clamped to -200 dB on silence). dB-SPL (Praat convention)
arrives in a later slice once microphone calibration is wired through.
Source: crates/python/src/lib.rs:3585
f0
builtin
¶
Estimates f0 over an Audio via time-domain autocorrelation.
Returns (times, frequencies) as a 2-tuple of NumPy arrays:
times is float64 in seconds, frequencies is float32 in Hz.
Source: crates/python/src/lib.rs:3417 · impl: crates/engine/src/pitch.rs:253
voiced_pitch
builtin
¶
voiced_pitch(audio, *, frame_size_seconds=0.03, hop_size_seconds=0.01, min_freq_hz=75.0, max_freq_hz=500.0, method='boersma', voicing_threshold=0.45)
Estimates f0 with a voicing decision and returns (times, frequencies,
voicing) as three NumPy arrays. times is float64 seconds at frame
centres; frequencies is float32 Hz; voicing is float32 in [0, 1].
method selects the pitch tracker. Two algorithmic families —
autocorrelation and cumulative-mean-normalized-difference — covering
both Praat-faithful and librosa-faithful expectations:
Autocorrelation family:
- "boersma" (default) — faithful Boersma 1993 / Praat Sound:
To Pitch (ac)… with very_accurate = false. Multi-candidate
per-frame detection + Viterbi path-finder with octave-cost /
octave-jump-cost / voiced-unvoiced-cost terms. Robust to halving /
doubling / transient errors; Praat-validated. The default because it
does not latch onto subharmonics of clean tones the way the simpler
trackers below do (e.g. 150→75, 250→83.3).
- "windowed_autocorrelation" — adopts Boersma 1993's window-correction
idea (divides windowed-signal autocorrelation by window
autocorrelation); fast single-peak tracker, but prone to
subharmonic / octave-down errors (no octave cost or path-finding).
- "autocorrelation" — naive time-domain autocorrelation (Phase-0
tracker; what sadda.dsp.f0(...) calls).
Cumulative-mean-normalized-difference family:
- "yin" — de Cheveigné & Kawahara 2002. Difference function +
CMNDF + absolute threshold. Simple and fast; independent
algorithmic family from autocorrelation, useful for
cross-validation against "boersma".
- "pyin" — Mauch & Dixon 2014, librosa's default. Probabilistic
YIN with a beta-prior distribution over thresholds plus an HMM
smoothing pass. librosa-validated.
- "swipe" — Camacho & Harris 2008 SWIPE' (prime variant). Spectral
method (a third algorithmic family): matches the sqrt-loudness
ERB-scale spectrum against prime-harmonic cosine kernels. Validated
against the author's own MATLAB run under Octave.
voicing_threshold is informational here: the function returns voicing
values for every frame so callers can apply their own threshold.
Source: crates/python/src/lib.rs:3726
formants
builtin
¶
formants(audio, *, frame_size_seconds=0.025, hop_seconds=0.01, n_formants=5, pre_emphasis=0.97, lpc_order=None, method='burg', max_bandwidth_hz=1000.0, min_frequency_hz=50.0)
Computes per-frame formants over an [Audio] via LPC + polynomial
root-finding. Returns a list of FormantFrames; each frame has
variable-length frequencies / bandwidths (honestly empty for frames
where the root-finder didn't return enough valid roots).
method selects the LPC estimator: "burg" (default; Praat
convention) or "autocorrelation". n_formants is the maximum kept per
frame after filtering; lpc_order = 2 · n_formants + 2 by default.
Source: crates/python/src/lib.rs:3778
mfcc
builtin
¶
mfcc(audio, *, frame_size_seconds=0.025, hop_seconds=0.01, n_mels=40, n_mfcc=13, f_min=0.0, f_max=None)
Computes Mel-Frequency Cepstral Coefficients over an [Audio]. Returns
a 2-D float32 NumPy array of shape (n_frames, n_mfcc), frames-first.
Defaults match librosa.feature.mfcc: Slaney mel scale, n_mels=40,
n_mfcc=13, f_min=0, f_max=sr/2, 25 ms frame, 10 ms hop.
Source: crates/python/src/lib.rs:3822
log_mel_whisper
builtin
¶
Whisper-exact log-mel spectrogram, shape (n_frames, n_mels).
Byte-faithful to OpenAI Whisper's encoder front end (Slaney mel,
power STFT with a periodic Hann window, log10 + clamp, global
dynamic-range floor, (+4)/4 normalisation). Expects 16 kHz mono
for Whisper fidelity. target_frames pads/trims the audio so the
result has exactly that many frames (Whisper uses 3000 for 30 s);
None keeps the natural length.
Source: crates/python/src/lib.rs:3858