Statistics

Classes for statistical distributions

class ionworkspipeline.data_fits.stats.Distribution(distribution=None)

Base class for sampling from probability distributions.

property argmin: float | ndarray

The point at which the normalized negative logpdf (evaluate_to_scalar) achieves its minimum value of zero. This is the negative logpdf shifted such that its minimum value is zero. For most distributions, this occurs at the mode of the distribution.

Returns

float or np.ndarray

The location of the minimum of the normalized negative logpdf.

Raises

NotImplementedError

This method should be implemented by the subclass.

cdf(x: float | ndarray) float | ndarray

Cumulative distribution function of the distribution.

Parameters

xfloat or array_like

Points at which to evaluate the CDF.

Returns

cdffloat or ndarray

Cumulative distribution function evaluated at x.

property distribution: Any

Returns the underlying scipy.stats distribution object.

Returns

distributionscipy.stats distribution

The underlying distribution object.

evaluate_to_array(x: float | ndarray) ndarray

Compute the residuals whose sum of squares equals evaluate_to_scalar(x). That is, np.sum(evaluate_to_array(x)**2) == evaluate_to_scalar(x).

Parameters

xfloat or np.ndarray

Point at which to evaluate.

Returns

np.ndarray

Residuals whose squared sum equals the normalized negative logpdf.

evaluate_to_scalar(x: float | ndarray) float

Compute the negative log probability density function (negative logpdf), normalized such that the minimum value is zero. This minimum occurs at the distribution’s argmin.

Parameters

xfloat or np.ndarray

Point at which to evaluate.

Returns

float

Normalized negative logpdf, with minimum value of 0 at the distribution’s argmin.

property multivariate: bool

Whether the distribution is multivariate.

Returns

multivariatebool

True if the distribution is multivariate, False otherwise.

pdf(x: float | ndarray) float | ndarray

Probability density function of the distribution.

Parameters

xfloat or array_like

Points at which to evaluate the PDF.

Returns

pdffloat or ndarray

Probability density function evaluated at x.

ppf(U: ndarray) ndarray

Percent point function (inverse of CDF) - transform samples from the standard uniform distribution to the distribution of the class.

Parameters

Unp.ndarray

Samples from the standard uniform distribution.

Returns

samplesnp.ndarray

Transformed samples from the target distribution.

rand(n: int | None = None) ndarray

Draw random samples from the distribution.

Parameters

nint, optional

Number of samples to draw.

Returns

samplesndarray

Random samples from the distribution.

property univariate: bool

Whether the distribution is univariate.

property zero_variance: bool

Whether the distribution has zero variance.

Returns

has_zero_variancebool

True if the distribution has zero variance, False otherwise.

class ionworkspipeline.data_fits.stats.Normal(mean: float, std: float)

Univariate normal distribution.

Extends: ionworkspipeline.data_fits.stats.stats.Distribution

property argmin: float

The point at which the normalized negative logpdf achieves its minimum value of zero. For the normal distribution, this occurs at the mean, where the negative logpdf is shifted to be zero.

Returns

float

The mean of the distribution.

evaluate_to_array(x: float) ndarray

Compute the residual for univariate normal: (x - μ) / σ / √2

When squared and summed, equals evaluate_to_scalar(x).

Parameters

xfloat

Point at which to evaluate.

Returns

np.ndarray

Residual whose square equals the normalized negative logpdf.

evaluate_to_scalar(x: float) float

Compute the normalized negative logpdf for univariate normal: ((x - μ)² / σ²) / 2

The minimum value of 0 occurs at x = μ (the mean).

Parameters

xfloat

Point at which to evaluate.

Returns

float

Normalized negative logpdf, equals 0 at x = μ.

property mean: float

Mean of the normal distribution.

Returns

meanfloat

Mean of the distribution.

property multivariate: bool

Whether the distribution is multivariate.

Returns

multivariatebool

Always False for univariate normal.

property std: float

Standard deviation of the normal distribution.

Returns

stdfloat

Standard deviation of the distribution.

class ionworkspipeline.data_fits.stats.MultivariateNormal(mean: ndarray, cov: ndarray)

Multivariate normal distribution.

Extends: ionworkspipeline.data_fits.stats.stats.Distribution

property argmin: ndarray

The point at which the normalized negative logpdf achieves its minimum value of zero. For the multivariate normal distribution, this occurs at the mean vector, where the negative logpdf is shifted to be zero.

Returns

np.ndarray

The mean vector of the distribution.

property cholesky_cov: ndarray

Cholesky factor of the covariance matrix of the multivariate normal distribution.

property cholesky_inv_cov: ndarray

Cholesky factor of the inverse covariance matrix of the multivariate normal distribution.

property cov: ndarray

Covariance matrix of the multivariate normal distribution.

Returns

covnp.ndarray

Covariance matrix of the distribution.

evaluate_to_array(x: ndarray) ndarray

Compute the residuals for multivariate normal: Σ⁻¹/₂ (x - μ) / √2

When squared and summed, equals evaluate_to_scalar(x).

Parameters

xnp.ndarray

Point at which to evaluate.

Returns

np.ndarray

Residuals whose squared sum equals the normalized negative logpdf.

evaluate_to_scalar(x: ndarray) float

Compute the normalized negative logpdf for multivariate normal: (x - μ)ᵀ Σ⁻¹ (x - μ) / 2

The minimum value of 0 occurs at x = μ (the mean vector).

Parameters

xnp.ndarray

Point at which to evaluate.

Returns

float

Normalized negative logpdf, equals 0 at x = μ.

property inv_cholesky_cov: ndarray

Inverse Cholesky factor of the covariance matrix of the multivariate normal distribution.

property inv_cov: ndarray

Inverse covariance matrix of the multivariate normal distribution.

property mean: ndarray

Mean vector of the multivariate normal distribution.

Returns

meannp.ndarray

Mean vector of the distribution.

property multivariate: bool

Whether the distribution is multivariate.

Returns

multivariatebool

Always True for multivariate normal.

ppf(U: ndarray) ndarray

Percent point function (inverse of CDF) - transform samples from the standard uniform distribution to the multivariate normal distribution. Vectorized Rosenblatt transform from Uniform(0,1) samples to a correlated multivariate normal N(mu, sigma).

Note: Unlike univariate distributions, the PPF for multivariate distributions is not unique. Many different points in the multivariate space can have the same cumulative probability. This implementation uses the Rosenblatt transformation to create a deterministic mapping from uniform samples to the multivariate normal distribution while preserving the correlation structure.

Parameters

Unp.ndarray

Samples from the standard uniform distribution.

Returns

samplesnp.ndarray

Samples from the multivariate normal distribution.

class ionworkspipeline.data_fits.stats.Uniform(lb: float, ub: float)

Uniform distribution.

Extends: ionworkspipeline.data_fits.stats.stats.Distribution

property argmin: float

The point at which the normalized negative logpdf achieves its minimum value of zero. For the uniform distribution, the negative logpdf equals zero for all points in [lb, ub] and infinity outside this interval. By convention, we return the midpoint of the interval.

Returns

float

(lb + ub) / 2, one of the points where the normalized negative logpdf equals zero.

evaluate_to_array(x: float) ndarray

Return a single-element array containing the square root of evaluate_to_scalar(x). Returns 0 for x in [lb, ub], infinity otherwise.

When squared, equals evaluate_to_scalar(x).

Parameters

xfloat

Point at which to evaluate.

Returns

np.ndarray

Single-element array containing 0 if x is in [lb, ub], infinity otherwise.

evaluate_to_scalar(x: float) float

Compute the normalized negative logpdf for uniform distribution. Returns 0 for x in [lb, ub], infinity otherwise.

The minimum value of 0 occurs for all x in [lb, ub].

Parameters

xfloat

Point at which to evaluate.

Returns

float

0 if x is in [lb, ub], infinity otherwise.

property lb: float

Lower bound of the uniform distribution.

Returns

lbfloat

Lower bound of the distribution.

property multivariate: bool

Whether the distribution is multivariate.

Returns

multivariatebool

Always False for univariate uniform.

property ub: float

Upper bound of the uniform distribution.

Returns

ubfloat

Upper bound of the distribution.

class ionworkspipeline.data_fits.stats.LogNormal(mean: float, std: float)

Univariate lognormal distribution.

Extends: ionworkspipeline.data_fits.stats.stats.Distribution

property argmin: float

The point at which the normalized negative logpdf achieves its minimum value of zero. For the lognormal distribution, this occurs at exp(μ - σ²), where μ and σ are the mean and standard deviation of the underlying normal distribution. The negative logpdf is shifted to be zero at this point.

Returns

float

exp(mean - std²), where the normalized negative logpdf equals zero.

evaluate_to_array(x: ndarray) ndarray

Compute the single residual for univariate lognormal. For x > 0, computes: ((log(x) - μ) / σ + σ) / √2

When squared, equals evaluate_to_scalar(x).

Parameters

xnp.ndarray

Point at which to evaluate.

Returns

np.ndarray

Single residual whose square equals the normalized negative logpdf. Returns infinity for x ≤ 0.

evaluate_to_scalar(x: ndarray) float

Compute the normalized negative logpdf for univariate lognormal. For x > 0, computes: ((log(x) - μ) / σ + σ)² / 2

The minimum value of 0 occurs at x = exp(μ - σ²).

Parameters

xnp.ndarray

Point at which to evaluate.

Returns

float

Normalized negative logpdf, equals 0 at x = exp(μ - σ²). Returns infinity for x ≤ 0.

property mean: float

Mean of the underlying normal distribution.

Returns

meanfloat

Mean of the underlying normal distribution.

property multivariate: bool

Whether the distribution is multivariate.

Returns

multivariatebool

Always False for univariate lognormal.

ppf(U: ndarray) ndarray

Percent point function (inverse of CDF) - transform samples from the standard uniform distribution to the lognormal distribution.

Parameters

Unp.ndarray

Samples from the standard uniform distribution.

Returns

samplesnp.ndarray

Samples from the lognormal distribution.

property std: float

Standard deviation of the underlying normal distribution.

Returns

stdfloat

Standard deviation of the underlying normal distribution.

class ionworkspipeline.data_fits.stats.MultivariateLogNormal(mean: ndarray, cov: ndarray)

Multivariate lognormal distribution.

Extends: ionworkspipeline.data_fits.stats.stats.Distribution

property argmin: ndarray

The point at which the normalized negative logpdf achieves its minimum value of zero. For the multivariate lognormal distribution, this occurs at exp(μ - diag(Σ)), where μ is the mean vector and Σ is the covariance matrix of the underlying multivariate normal distribution. The negative logpdf is shifted to be zero at this point.

Returns

np.ndarray

exp(mean - rowsum(cov)), where the normalized negative logpdf equals zero.

cdf(x: ndarray) float

Cumulative distribution function of the multivariate lognormal distribution.

Parameters

xarray_like

Points at which to evaluate the CDF.

Returns

cdffloat

Cumulative distribution function evaluated at x.

property cov: ndarray

Covariance matrix of the underlying multivariate normal distribution.

Returns

covnp.ndarray

Covariance matrix of the underlying multivariate normal distribution.

evaluate_to_array(x: ndarray) ndarray

Compute the residuals for multivariate lognormal. For x > 0, computes: Σ⁻¹/₂ (log(x) - μ + Σ1) / √2 where Σ1 is the row sums of the covariance matrix.

When squared and summed, equals evaluate_to_scalar(x).

Parameters

xnp.ndarray

Point at which to evaluate.

Returns

np.ndarray

Residuals whose squared sum equals the normalized negative logpdf. Returns infinity for components where x ≤ 0.

evaluate_to_scalar(x: ndarray) float

Compute the normalized negative logpdf for multivariate lognormal. For x > 0, computes: ((log(x) - μ)ᵀ Σ⁻¹ (log(x) - μ)) / 2 + sum(log(x))

The minimum value of 0 occurs at x = exp(μ - diag(Σ)).

Parameters

xnp.ndarray

Point at which to evaluate.

Returns

float

Normalized negative logpdf, equals 0 at x = exp(μ - diag(Σ)). Returns infinity if any component of x ≤ 0.

property mean: ndarray

Mean vector of the underlying multivariate normal distribution.

Returns

meannp.ndarray

Mean vector of the underlying multivariate normal distribution.

property multivariate: bool

Whether the distribution is multivariate.

Returns

multivariatebool

Always True for multivariate lognormal.

pdf(x: ndarray) float

Probability density function of the multivariate lognormal distribution.

Parameters

xarray_like

Points at which to evaluate the PDF.

Returns

pdffloat

Probability density function evaluated at x.

ppf(U: ndarray) ndarray

Percent point function (inverse of CDF) - transform samples from the standard uniform distribution to the multivariate lognormal distribution.

Parameters

Unp.ndarray

Samples from the standard uniform distribution.

Returns

samplesnp.ndarray

Samples from the multivariate lognormal distribution.

rand(n: int | None = None) ndarray

Draw random samples from the distribution.

Parameters

nint, optional

Number of samples to draw.

Returns

samplesnp.ndarray

Random samples from the multivariate lognormal distribution.

class ionworkspipeline.data_fits.stats.PointMass(value: float)

PointMass distribution (constant value).

Extends: ionworkspipeline.data_fits.stats.stats.Distribution

property argmin: float

The point at which the normalized negative logpdf achieves its minimum value of zero. For a point mass distribution, the negative logpdf equals zero only at the point mass value and infinity elsewhere.

Returns

float

The point mass value, where the normalized negative logpdf equals zero.

cdf(x: float | ndarray) float | ndarray

Cumulative distribution function of the point mass distribution.

Parameters

xfloat or array_like

Points at which to evaluate the CDF.

Returns

cdffloat or ndarray

Cumulative distribution function evaluated at x (0 for x < value, 1 for x >= value).

evaluate_to_array(x: float) ndarray

Return a single-element array containing the square root of evaluate_to_scalar(x). Returns 0 if x equals the point mass value, infinity otherwise.

When squared, equals evaluate_to_scalar(x).

Parameters

xfloat

Point at which to evaluate.

Returns

np.ndarray

Single-element array containing 0 if x equals the point mass value, infinity otherwise.

evaluate_to_scalar(x: float) float

Compute the normalized negative logpdf for point mass distribution. Returns 0 if x equals the point mass value, infinity otherwise.

The minimum value of 0 occurs only at x = value.

Parameters

xfloat

Point at which to evaluate.

Returns

float

0 if x equals the point mass value, infinity otherwise.

property multivariate: bool

Whether the distribution is multivariate.

Returns

multivariatebool

Always False for point mass.

pdf(x: float | ndarray) float | ndarray

Probability density function of the point mass distribution.

Parameters

xfloat or array_like

Points at which to evaluate the PDF.

Returns

pdffloat or ndarray

Probability density function evaluated at x (infinity at the constant value, 0 elsewhere).

ppf(U: ndarray) ndarray

Percent point function (inverse of CDF) - transform samples from the standard uniform distribution to the point mass distribution.

Parameters

Unp.ndarray

Samples from the standard uniform distribution.

Returns

samplesnp.ndarray

Samples from the point mass distribution (all equal to the constant value).

rand(n: int | None = None) ndarray

Draw random samples from the distribution.

Parameters

nint, optional

Number of samples to draw.

Returns

samplesndarray

Samples from the point mass distribution (all equal to the constant value).

property value: float

The constant value of the point mass distribution.

Returns

valuefloat

The constant value.

property zero_variance: bool

Whether the distribution has zero variance.

Returns

has_zero_variancebool

Always True for point mass.

class ionworkspipeline.data_fits.stats.Dirichlet(alpha: ndarray)

Dirichlet distribution.

Extends: ionworkspipeline.data_fits.stats.stats.Distribution

property alpha: ndarray

Concentration parameters of the Dirichlet distribution.

Returns

alphanp.ndarray

Concentration parameters.

property argmin: ndarray

The point at which the normalized negative logpdf achieves its minimum value of zero. For the Dirichlet distribution, this occurs at the mode.

Returns

np.ndarray

The mode of the distribution, where all alpha_i > 1.

cdf(x: float | ndarray) float | ndarray

Cumulative distribution function of the distribution.

Parameters

xfloat or array_like

Points at which to evaluate the CDF.

Returns

cdffloat or ndarray

Cumulative distribution function evaluated at x.

evaluate_to_array(x: ndarray) ndarray

Compute the residuals for Dirichlet distribution.

When squared and summed, equals evaluate_to_scalar(x).

Parameters

xnp.ndarray

Point at which to evaluate.

Returns

np.ndarray

Residuals whose squared sum equals the normalized negative logpdf.

evaluate_to_scalar(x: ndarray) float

Compute the normalized negative logpdf for Dirichlet distribution. For x on the simplex (x_i > 0 and sum(x) = 1), computes: -sum((alpha_i - 1) * log(x_i))

The minimum value of 0 occurs at the mode of the distribution.

Parameters

xnp.ndarray

Point at which to evaluate.

Returns

float

Normalized negative logpdf, equals 0 at the mode. Returns infinity if any x_i <= 0 or sum(x) != 1.

property multivariate: bool

Whether the distribution is multivariate.

Returns

multivariatebool

Always True for Dirichlet distribution.

pdf(x: float | ndarray) float | ndarray

Probability density function of the Dirichlet distribution.

Parameters

xnp.ndarray

Points at which to evaluate the PDF. Must be on the simplex (all x_i > 0 and sum(x) = 1).

Returns

pdffloat | np.ndarray

Probability density function evaluated at x. Returns 0 if any x_i <= 0 or sum(x) != 1.

ppf(U: ndarray) ndarray

Percent point function (inverse of CDF) for Dirichlet distribution.

For Dirichlet, we use the property that if X_i ~ Gamma(alpha_i, 1), then [X_1, X_2, …, X_k] / sum(X) ~ Dirichlet(alpha_1, alpha_2, …, alpha_k)

Parameters

Unp.ndarray

Samples from the standard uniform distribution.

Returns

samplesnp.ndarray

Samples from the Dirichlet distribution.