sadda.refdist¶
Reference distributions — install + query curated populations (observed corpora, normative ranges, target zones); pin them per project; publish your own back to the registry. PROVISIONAL tier.
Design rationale lives in the
2026-05-18 "Reference distribution governance" DEVLOG entry.
The three-tier registry, the refdist.toml schema, and the
GUI overlay encoding for observed vs normative vs target are all
covered there.
Resolving distributions¶
list_all ¶
Every distribution in the store (default: the per-user cache).
query ¶
query(*, parameter: Optional[str] = None, language: Optional[str] = None, variety: Optional[str] = None, sex: Optional[str] = None, age_band: Optional[str] = None, phone: Optional[str] = None, kind: Optional[str] = None, root: Optional[str] = None) -> list[RefDist]
Distributions matching the given facets. Any omitted facet matches
anything; string matches are case-insensitive. kind is one of
observed_distribution | summary_normative_range |
target_zone.
get ¶
The distribution with this id and version, or None.
install ¶
Install a distribution directory (a refdist.toml + its data
file) into the store by copying it in — how the bundled starter set
seeds the user cache. Returns the installed distribution.
store_root ¶
Filesystem path of the active store (the per-user cache by default, created if missing).
Publishing your own¶
scaffold ¶
scaffold(dest_dir: str, data: Any, *, id: str, version: str, kind: str, parameters: Optional[list[str]] = None, title: Optional[str] = None, doi: Optional[str] = None, license: Optional[str] = None, language: Optional[str] = None, variety: Optional[str] = None, sex: Optional[list[str]] = None, age_band: Optional[list[str]] = None, units: Optional[str] = None, phones: Optional[list[str]] = None, shareability: Optional[str] = None, min_n_per_subgroup: Optional[int] = None, authors: Optional[list[str]] = None, year: Optional[int] = None, provenance: Optional[str] = None) -> RefDist
Scaffold a publishable distribution directory from an analysis
result (C9). Writes data.parquet from data (a
polars.DataFrame), then refdist.toml + provenance.md + a
LICENSE stub from the metadata. schema.columns is taken from
the DataFrame, and n_speakers is inferred from a speaker_id
column if present.
The result is immediately resolvable and passes the registry
validator once you (a) replace the LICENSE stub with the full license
text and (b) fill in real provenance. To submit, copy the directory
under the registry's tier3/<id>/ and open a fork-and-PR (the auth
is your GitHub credentials, not sadda's).
Data types¶
RefDist ¶
One resolved reference distribution (its parsed manifest + on-disk location).
Source: crates/python/src/refdist.rs:30
__doc__
class-attribute
¶
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
__module__
class-attribute
¶
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
kind
property
¶
Measure kind: observed_distribution | summary_normative_range
| target_zone.
Source: crates/python/src/refdist.rs:79
language
property
¶
ISO 639-3 language code, if declared.
Source: crates/python/src/refdist.rs:99
n_speakers
property
¶
Number of speakers, if declared.
Source: crates/python/src/refdist.rs:119
parameters
property
¶
Measured parameters (e.g. ["F1", "F2"]).
Source: crates/python/src/refdist.rs:84
shareability
property
¶
Shareability declaration (raw_samples | summary_only).
Source: crates/python/src/refdist.rs:139
units
property
¶
Parameter units (e.g. "Hz"), if declared.
Source: crates/python/src/refdist.rs:89
column
method descriptor
¶
Reads a numeric data-file column as a list of floats, keeping only
rows where every filter entry matches (string columns,
case-insensitive). E.g. column("F1", filter={"phone": "iy"}).
(D10, provisional.)
Source: crates/python/src/refdist.rs:154
data ¶
Read this distribution's data file into a polars.DataFrame.
Returns None if the manifest declares no data file.
data_path
method descriptor
¶
Absolute path to the data file, if the manifest declares one.
Source: crates/python/src/refdist.rs:143
histogram
method descriptor
¶
Equal-width histogram of a parameter's raw samples. Errors on a
summary_normative_range (no raw samples). (D10, provisional.)
Source: crates/python/src/refdist.rs:189
points2d
method descriptor
¶
Reads two numeric columns as aligned (xs, ys) lists — e.g.
points2d("F1", "F2", filter={"phone": "iy"}) for a vowel-space
scatter. (D10, provisional.)
Source: crates/python/src/refdist.rs:212
summary
method descriptor
¶
Distribution summary (mean / SD / percentiles) of a 1-D parameter.
Empirical for an observed distribution; a normal model of the
published mean/SD for a summary_normative_range. filter subsets
by subgroup (e.g. filter={"sex": "m"}). (D10, provisional.)
Source: crates/python/src/refdist.rs:168
Summary ¶
Distribution summary of a 1-D measure (D10): mean, SD, and
percentiles. Returned by [PyRefDist::summary].
Source: crates/python/src/refdist.rs:245
__doc__
class-attribute
¶
__doc__ = 'Distribution summary of a 1-D measure (D10): mean, SD, and\npercentiles. Returned by [`PyRefDist::summary`].'
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
__module__
class-attribute
¶
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
max
property
¶
Maximum (observed) / mean + 2·SD (normative range).
Source: crates/python/src/refdist.rs:298
min
property
¶
Minimum (observed) / mean − 2·SD (normative range).
Source: crates/python/src/refdist.rs:268
n
property
¶
Number of underlying values (raw samples, or declared speakers).
Source: crates/python/src/refdist.rs:253
Histogram ¶
Equal-width histogram of a 1-D measure (D10). Returned by
[PyRefDist::histogram]; len(edges) == len(counts) + 1.
Source: crates/python/src/refdist.rs:313
__doc__
class-attribute
¶
__doc__ = 'Equal-width histogram of a 1-D measure (D10). Returned by\n[`PyRefDist::histogram`]; `len(edges) == len(counts) + 1`.'
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.
__module__
class-attribute
¶
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.