Skip to content

sadda.ml

ONNX-model inference over audio — bundled Silero VAD plus a generic embedding-extraction harness for wav2vec2-style (waveform) and Whisper-style (log-mel) encoders. PROVISIONAL tier.

ONNX Runtime is loaded at runtime, not linked into the wheel. With pip install "sadda[ml]" the wheel auto-discovers the installed onnxruntime package at import time and sets ORT_DYLIB_PATH; the desktop-app bundles ship the runtime as a sidecar so it just works. Without ORT available, these calls raise a clear "ONNX Runtime not available" error rather than crashing — see the 2026-05-28 ORT-sidecar packaging DEVLOG entry.

Downloading models (hf://)

load_model("hf://<org>/<name>/<file>[@<rev>]") fetches a model from HuggingFace into the local cache and runs it (unverified passthrough — prefer a curated sadda/… id when one exists). pip install "sadda[download]" is the convenient install (it pulls ONNX Runtime so a downloaded model is immediately runnable).

sadda never touches the network unless you opt in. The fetch is compiled into the wheel but stays dormant until you set the environment variable SADDA_ALLOW_NETWORK=1; without it, an hf:// cache miss raises a clear "network access is disabled" error. Cached models and local:// / sadda/… ids always work offline. Authenticate to private or gated repos with HF_TOKEN. The desktop app does not compile this in — the GUI is network-free by construction.

import os
os.environ["SADDA_ALLOW_NETWORK"] = "1"      # explicit opt-in
m = sadda.ml.load_model("hf://onnx-community/silero-vad/onnx/model.onnx")

Voice activity detection (bundled)

vad

vad(audio, *, model_path: Optional[str] = None)

Run Silero VAD over audio.

Returns (times, speech_probs) as NumPy arrays — one entry per ~32 ms window (the audio is mono-mixed and resampled to 16 kHz). Uses the bundled model unless model_path points at another ONNX VAD model. Raises if ONNX Runtime isn't available.

speech_segments

speech_segments(audio, *, threshold: float = 0.5, model_path: Optional[str] = None)

Speech regions in audio as (start_seconds, end_seconds).

Runs :func:vad, then merges consecutive windows whose probability is >= threshold. Uses the bundled model unless model_path is given.

Model resolution + embeddings

load_model

load_model(id)

Resolve a model by id, returning a :class:Model.

id is one of: "sadda/<name>[@version]" (curated registry, falling back to the bundled set), "local://<path>" (a model directory with a model.toml, or a bare model file), or "hf://<repo>" (HuggingFace passthrough — arrives in a later release). The returned model exposes .vad(audio) plus .id / .version / .kind / .weights_checksum metadata.

install_model

install_model(src_dir, *, root=None)

Install a model directory (a model.toml + its files) into the store by copying it in — how the bundled set seeds the cache and where a fetched model lands. Returns the installed :class:Model.

get_model

get_model(id, version, *, root=None)

The model with this id + version in the store (the per-user cache by default, or an explicit root), or None.

Model

A model resolved from the registry by [load_model].

__doc__ class-attribute

__doc__ = 'A model resolved from the registry by [`load_model`].'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__module__ class-attribute

__module__ = 'sadda._native.ml'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

id property

id

Resolvable id (e.g. "sadda/silero-vad").

kind property

kind

Model kind (vad, embedding, …).

license property

license

SPDX license id, if declared.

title property

title

Human-readable title.

version property

version

Version.

weights_checksum property

weights_checksum

Weights checksum (sha256:…), if declared.

__repr__ method descriptor

__repr__()

Return repr(self).

embeddings method descriptor

embeddings(audio)

Runs this model as an embedding extractor over audio, returning a (frames, dims) float64 NumPy array. The input is shaped per the model's declared representation (waveform / log_mel). Errors unless ONNX Runtime is available.

vad method descriptor

vad(audio)

Runs this model as a VAD over audio(times, speech_probs). Errors unless it's a vad model and ONNX Runtime is available.