Skip to content

sadda.refdist

Reference distributions — install + query curated populations (observed corpora, normative ranges, target zones); pin them per project; publish your own back to the registry. PROVISIONAL tier.

Design rationale lives in the 2026-05-18 "Reference distribution governance" DEVLOG entry. The three-tier registry, the refdist.toml schema, and the GUI overlay encoding for observed vs normative vs target are all covered there.

Resolving distributions

list_all

list_all(*, root: Optional[str] = None) -> list[RefDist]

Every distribution in the store (default: the per-user cache).

query

query(*, parameter: Optional[str] = None, language: Optional[str] = None, variety: Optional[str] = None, sex: Optional[str] = None, age_band: Optional[str] = None, phone: Optional[str] = None, kind: Optional[str] = None, root: Optional[str] = None) -> list[RefDist]

Distributions matching the given facets. Any omitted facet matches anything; string matches are case-insensitive. kind is one of observed_distribution | summary_normative_range | target_zone.

get

get(id: str, version: str, *, root: Optional[str] = None) -> Optional[RefDist]

The distribution with this id and version, or None.

install

install(src_dir: str, *, root: Optional[str] = None) -> RefDist

Install a distribution directory (a refdist.toml + its data file) into the store by copying it in — how the bundled starter set seeds the user cache. Returns the installed distribution.

store_root

store_root(*, root: Optional[str] = None) -> str

Filesystem path of the active store (the per-user cache by default, created if missing).

Publishing your own

scaffold

scaffold(dest_dir: str, data: Any, *, id: str, version: str, kind: str, parameters: Optional[list[str]] = None, title: Optional[str] = None, doi: Optional[str] = None, license: Optional[str] = None, language: Optional[str] = None, variety: Optional[str] = None, sex: Optional[list[str]] = None, age_band: Optional[list[str]] = None, units: Optional[str] = None, phones: Optional[list[str]] = None, shareability: Optional[str] = None, min_n_per_subgroup: Optional[int] = None, authors: Optional[list[str]] = None, year: Optional[int] = None, provenance: Optional[str] = None) -> RefDist

Scaffold a publishable distribution directory from an analysis result (C9). Writes data.parquet from data (a polars.DataFrame), then refdist.toml + provenance.md + a LICENSE stub from the metadata. schema.columns is taken from the DataFrame, and n_speakers is inferred from a speaker_id column if present.

The result is immediately resolvable and passes the registry validator once you (a) replace the LICENSE stub with the full license text and (b) fill in real provenance. To submit, copy the directory under the registry's tier3/<id>/ and open a fork-and-PR (the auth is your GitHub credentials, not sadda's).

Data types

RefDist

One resolved reference distribution (its parsed manifest + on-disk location).

__doc__ class-attribute

__doc__ = 'One resolved reference distribution (its parsed manifest + on-disk\nlocation).'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__module__ class-attribute

__module__ = 'sadda._native.refdist'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

age_band property

age_band

Age bands represented.

authors property

authors

Citation authors.

bibtex property

bibtex

BibTeX entry, if declared.

doi property

doi

DOI, if any.

id property

id

Stable distribution id.

kind property

kind

Measure kind: observed_distribution | summary_normative_range | target_zone.

language property

language

ISO 639-3 language code, if declared.

n_speakers property

n_speakers

Number of speakers, if declared.

parameters property

parameters

Measured parameters (e.g. ["F1", "F2"]).

phones property

phones

Phones covered, if applicable.

sex property

sex

Sexes represented.

shareability property

shareability

Shareability declaration (raw_samples | summary_only).

title property

title

Human-readable title.

units property

units

Parameter units (e.g. "Hz"), if declared.

variety property

variety

Variety / dialect, if declared.

version property

version

Semantic version.

year property

year

Publication year, if declared.

__repr__ method descriptor

__repr__()

Return repr(self).

column method descriptor

column(name, *, filter=None)

Reads a numeric data-file column as a list of floats, keeping only rows where every filter entry matches (string columns, case-insensitive). E.g. column("F1", filter={"phone": "iy"}). (D10, provisional.)

data

data()

Read this distribution's data file into a polars.DataFrame.

Returns None if the manifest declares no data file.

data_path method descriptor

data_path()

Absolute path to the data file, if the manifest declares one.

histogram method descriptor

histogram(parameter, *, bins=20, filter=None)

Equal-width histogram of a parameter's raw samples. Errors on a summary_normative_range (no raw samples). (D10, provisional.)

points2d method descriptor

points2d(x_param, y_param, *, filter=None)

Reads two numeric columns as aligned (xs, ys) lists — e.g. points2d("F1", "F2", filter={"phone": "iy"}) for a vowel-space scatter. (D10, provisional.)

summary method descriptor

summary(parameter, *, filter=None)

Distribution summary (mean / SD / percentiles) of a 1-D parameter. Empirical for an observed distribution; a normal model of the published mean/SD for a summary_normative_range. filter subsets by subgroup (e.g. filter={"sex": "m"}). (D10, provisional.)

Summary

Distribution summary of a 1-D measure (D10): mean, SD, and percentiles. Returned by [PyRefDist::summary].

__doc__ class-attribute

__doc__ = 'Distribution summary of a 1-D measure (D10): mean, SD, and\npercentiles. Returned by [`PyRefDist::summary`].'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__module__ class-attribute

__module__ = 'sadda._native.refdist'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

max property

max

Maximum (observed) / mean + 2·SD (normative range).

mean property

mean

Arithmetic mean.

median property

median

min property

min

Minimum (observed) / mean − 2·SD (normative range).

n property

n

Number of underlying values (raw samples, or declared speakers).

p25 property

p25

25th percentile.

p5 property

p5

5th percentile.

p75 property

p75

75th percentile.

p95 property

p95

95th percentile.

sd property

sd

Standard deviation.

__repr__ method descriptor

__repr__()

Return repr(self).

Histogram

Equal-width histogram of a 1-D measure (D10). Returned by [PyRefDist::histogram]; len(edges) == len(counts) + 1.

__doc__ class-attribute

__doc__ = 'Equal-width histogram of a 1-D measure (D10). Returned by\n[`PyRefDist::histogram`]; `len(edges) == len(counts) + 1`.'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

__module__ class-attribute

__module__ = 'sadda._native.refdist'

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

counts property

counts

Per-bin sample counts.

edges property

edges

Bin boundaries, ascending.

__repr__ method descriptor

__repr__()

Return repr(self).