sadda.ml¶
ONNX-model inference over audio — bundled Silero VAD plus a generic embedding-extraction harness for wav2vec2-style (waveform) and Whisper-style (log-mel) encoders. PROVISIONAL tier.
ONNX Runtime is loaded at runtime, not linked into the wheel. With
pip install "sadda[ml]" the wheel auto-discovers the installed
onnxruntime package at import time and sets ORT_DYLIB_PATH; the
desktop-app bundles ship the runtime as a sidecar so it just works.
Without ORT available, these calls raise a clear "ONNX Runtime not
available" error rather than crashing — see the
2026-05-28 ORT-sidecar packaging DEVLOG entry.
Downloading models (hf://)¶
load_model("hf://<org>/<name>/<file>[@<rev>]") fetches a model from
HuggingFace into the local cache and runs it (unverified passthrough —
prefer a curated sadda/… id when one exists). pip install
"sadda[download]" is the convenient install (it pulls ONNX Runtime so a
downloaded model is immediately runnable).
sadda never touches the network unless you opt in. The fetch is
compiled into the wheel but stays dormant until you set the environment
variable SADDA_ALLOW_NETWORK=1; without it, an hf:// cache miss
raises a clear "network access is disabled" error. Cached models and
local:// / sadda/… ids always work offline. Authenticate to private
or gated repos with HF_TOKEN. The desktop app does not compile this
in — the GUI is network-free by construction.
import os
os.environ["SADDA_ALLOW_NETWORK"] = "1" # explicit opt-in
m = sadda.ml.load_model("hf://onnx-community/silero-vad/onnx/model.onnx")
Voice activity detection (bundled)¶
vad ¶
Run Silero VAD over audio.
Returns (times, speech_probs) as NumPy arrays — one entry per
~32 ms window (the audio is mono-mixed and resampled to 16 kHz).
Uses the bundled model unless model_path points at another ONNX
VAD model. Raises if ONNX Runtime isn't available.
Source: python/sadda/ml/__init__.py:131
speech_segments ¶
Speech regions in audio as (start_seconds, end_seconds).
Runs :func:vad, then merges consecutive windows whose probability is
>= threshold. Uses the bundled model unless model_path is given.
Source: python/sadda/ml/__init__.py:143
Model resolution + embeddings¶
load_model ¶
Resolve a model by id, returning a :class:Model.
id is one of: "sadda/<name>[@version]" (curated registry,
falling back to the bundled set), "local://<path>" (a model
directory with a model.toml, or a bare model file), or
"hf://<repo>" (HuggingFace passthrough — arrives in a later
release). The returned model exposes .vad(audio) plus .id /
.version / .kind / .weights_checksum metadata.
Source: python/sadda/ml/__init__.py:102
install_model ¶
Install a model directory (a model.toml + its files) into the
store by copying it in — how the bundled set seeds the cache and where
a fetched model lands. Returns the installed :class:Model.
Source: python/sadda/ml/__init__.py:116
get_model ¶
The model with this id + version in the store (the per-user
cache by default, or an explicit root), or None.
Source: python/sadda/ml/__init__.py:124
Model ¶
A model resolved from the registry by [load_model].
Source: crates/python/src/ml.rs:59
__doc__
class-attribute
¶
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
__module__
class-attribute
¶
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
weights_checksum
property
¶
Weights checksum (sha256:…), if declared.
Source: crates/python/src/ml.rs:92
embeddings
method descriptor
¶
Runs this model as an embedding extractor over audio, returning a
(frames, dims) float64 NumPy array. The input is shaped per the
model's declared representation (waveform / log_mel). Errors
unless ONNX Runtime is available.
Source: crates/python/src/ml.rs:114
vad
method descriptor
¶
Runs this model as a VAD over audio → (times, speech_probs).
Errors unless it's a vad model and ONNX Runtime is available.
Source: crates/python/src/ml.rs:99