Data Fits

class ionworkspipeline.data_fits.DataFit(objectives: Objective | dict[str, Objective], source: str = '', parameters: dict | None = None, cost: Cost | None = None, initial_guesses: dict | Result | DataFrame | None = None, optimizer: Optimizer | Sampler | None = None, cost_logger: CostLogger | None = None, multistarts: int | None = None, num_workers: int | None = None, max_batch_size: int | None = None, initial_guess_sampler: DistributionSampler | None = None, priors: Prior | list[Prior] | None = None, options: dict | None = None)

A pipeline element that fits a model to data.

Parameters

objectivesionworkspipeline.objectives.Objective or dict[str, ionworkspipeline.objectives.Objective]

The objective(s) to use for the fit. This can be a single objective, or a dictionary of objectives. Each objective can be any class that implements the method build, which creates a function Objective.run() that takes a dictionary of parameters and returns a scalar or vector cost. In general, we subclass ionworkspipeline.objectives.Objective to implement a particular objective.

sourcestr

A string describing the source of the data.

parametersdict, optional
A dictionary of parameters to fit. The values can be:
  • an iwp.Parameter object, e.g. iwp.Parameter(“x”)

  • a pybamm expression, in which case the other parameters should also be explicitly provided as iwp.Parameter objects, e.g.

    {

    “param”: 2 * pybamm.Parameter(“half-param”),

    “half-param”: iwp.Parameter(“half-param”)

    }

    works, but

    {“param”: 2 * iwp.Parameter(“half-param”)} would not work.

  • a function, containing other parameters, in which case the other parameters should again also be explicitly provided as iwp.Parameter objects, e.g.

    {

    “main parameter”: lambda x: pybamm.Parameter(“other parameter”) * x**2,

    “other parameter”: iwp.Parameter(“other parameter”)

    }

The name of the input parameter does not need to match the name of the parameter. In all cases, the DataFit class will automatically process this input to fit for “x”.

costionworkspipeline.costs.Cost, optional

The cost function to use when constructing the objective. If None, uses the optimizer’s default cost function.

initial_guessesdict or list of dicts, optional

Initial guesses for the parameters. If a single dictionary, then this is used as the initial guess for all optimization jobs in each batch. If a list of dictionaries, then each dictionary is used as the initial guess for a single job.

optimizerionworkspipeline.optimizers.Optimizer or ionworkspipeline.samplers.Sampler, optional

The optimizer to use for the fit. Default is set by the DataFit subclasses.

cost_loggerionworkspipeline.data_fits.CostLogger, optional

A cost logger to use for logging the cost and parameters during the fit. Default is iwp.data_fits.CostLogger with default options.

multistartsint, optional

Number of times to run the optimization from different initial guesses. If None, only runs once from the provided initial guess.

num_workersint, optional

Number of worker processes to use for parallel batch processing. Not supported on Windows. If num_workers = 1, then multiprocessing is disabled. If num_workers = None, then the number of workers is set to the number of CPU cores.

max_batch_sizeint, optional

Maximum number of optimization jobs to include in a single batch. If None, defaults to the largest possible batch size, which is the ceiling of the total number of jobs divided by the number of workers.

initial_guess_samplerionworkspipeline.data_fits.distribution_samplers.DistributionSampler, optional

Sampler to use for generating initial guesses of multistarted parameter estimations. Default is ionworkspipeline.data_fits.distribution_samplers.LatinHypercube.

priorsionworkspipeline.priors.Prior or list[ionworkspipeline.priors.Prior], optional

Priors to use for the fit.

optionsdict, optional

A dictionary of options to pass to the data fit. By default:

options = {
    # Random seed for reproducibility. Defaults to a random seed generated
    # determined by the current time.
    "seed": iwutil.random.generate_seed(),
}

Note: These options only have an effect if model.convert_to_format == ‘casadi’

Extends: ionworkspipeline.data_fits.data_fit._DataFitBase

approximate_model_distribution(parameters: dict | None = None, *args, remove_priors=None, **kwargs) MultivariateNormal

Approximate the optimized model results as a linearized multivariate normal distribution.

Parameters

parametersdict, optional

Dictionary of parameter values to use for the approximation.

*argstuple, optional

Additional positional arguments to pass to the compute_parameter_covariance method.

remove_priorsbool, optional

Whether to remove the contribution of the priors. Default is False.

**kwargsdict, optional

Additional keyword arguments to pass to the compute_parameter_covariance method.

Returns

ionworkspipeline.stats.MultivariateNormal

The multivariate normal approximation of the optimized model results.

property batch_ids: list[int]

Get the batch IDs.

property batches: list[_DataFitBatch]

Get the list of optimization jobs.

check_initial_guesses()

Check that the initial parameter guesses are different for multistarts.

This method verifies that initial guesses differ across jobs, but ignores parameters with zero variance distributions (like PointMass distributions) when checking for uniqueness.

Raises

ValueError

If initial guesses are not unique or number of guesses does not match jobs.

compute_gradient(parameters: dict | None = None, cost: Cost | None = None, options: dict | None = None) ndarray

Compute the gradient of the objective function.

Parameters

parametersdict, optional

Dictionary of parameter values to use for the gradient calculation.

costiwp.costs.Cost, optional

The cost to use for the gradient calculation. If None, uses the current cost.

optionsdict, optional

Additional keyword arguments to pass to the numdifftools finite difference method.

Returns

np.ndarray

The gradient of the objective function.

compute_hessian(parameters: dict | None = None, cost: Cost | None = None, gauss_newton: bool | None = None, options: dict | None = None) ndarray

Compute the Hessian matrix of the objective function.

Parameters

parametersdict, optional

Dictionary of parameter values to use for the Hessian calculation. If None, uses DataFit.results.

costiwp.costs.Cost, optional

The cost to use for the Hessian calculation. If None, uses the current cost.

gauss_newtonbool, optional

Whether to use the Gauss-Newton method to compute the Hessian. If None, this method is enabled if the cost supports array output.

optionsdict, optional

Additional keyword arguments to pass to the numdifftools finite difference method.

Returns

np.ndarray

The Hessian matrix of the objective function.

compute_inverse_parameter_covariance(parameters: dict | None = None, *args, **kwargs) ndarray

Compute the inverse parameter covariance matrix of the sum of squared residuals objective function, V_p^-1 = (residuals(p)^T * V_epsilon^-1 * residuals(p)) where V_epsilon is the model error covariance matrix.

Parameters

parametersdict, optional

Dictionary of parameter values to use for the Hessian calculation.

*argstuple, optional

Additional positional arguments to pass to the compute_hessian method.

**kwargsdict, optional

Additional keyword arguments to pass to the compute_hessian method.

Returns

np.ndarray

The parameter covariance matrix of the objective function.

compute_jacobian(parameters: dict | None = None, cost: Cost | None = None, options: dict | None = None) ndarray

Compute the Jacobian matrix of the objective function.

Parameters

parametersdict, optional

Dictionary of parameter values to use for the Jacobian calculation.

costiwp.costs.Cost, optional

The cost to use for the Jacobian calculation. If None, uses the current cost.

optionsdict, optional

Additional keyword arguments to pass to the numdifftools finite difference method.

Returns

np.ndarray

The Jacobian matrix of the objective function.

compute_parameter_covariance(*args, **kwargs) ndarray

Compute the parameter covariance matrix of the sum of squared residuals objective function, (residuals(p)^T * V_epsilon^-1 * residuals(p))^-1 where V_epsilon is the model error covariance matrix.

Parameters

parametersdict, optional

Dictionary of parameter values to use for the Hessian calculation.

*argstuple, optional

Additional positional arguments to pass to the compute_hessian method.

**kwargsdict, optional

Additional keyword arguments to pass to the compute_hessian method.

Returns

np.ndarray

The parameter covariance matrix of the objective function.

compute_residuals(parameters: dict | None = None, cost: Cost | None = None) ndarray

Compute the residuals of the objective function.

Parameters

parametersdict, optional

Dictionary of parameter values to use for the residuals calculation.

costiwp.costs.Cost, optional

The cost to use for the residuals calculation. If None, uses the current cost.

Returns

np.ndarray

The residuals of the objective function.

property data_fit_runner: _DataFitRunner | None

Get the data fit runner.

estimate_variable_standard_deviations(parameters: dict | None = None) dict[str, float]

Estimate the standard deviations of the variables.

property explicit_initial_guesses: bool

Get the initial guesses flag.

get_batch(batch_id: int) _DataFitBatch

Get the batch object for a given batch ID.

Each batch represents a group of optimization jobs that are processed together for improved efficiency. This method retrieves or creates a _DataFitBatch object that manages these jobs.

Parameters

batch_idint | None

Unique identifier for the batch to retrieve

Returns

_DataFitBatch

The batch object for the given ID

get_fit_results()

Get the results of the fit.

Returns

dict

Dictionary containing fit results for each objective.

property initial_guess_distributions: dict[str, Distribution]

Get the initial guess distributions for the parameters being fit.

property initial_guess_sampler: Sampler

Get the initial guess sampler.

property is_parent: bool

Whether the DataFit is a parent.

property job_ids: list[ndarray]

Get the job IDs.

linear_confidence_intervals(parameters: dict | None = None, confidence_level: float | None = None, variable_standard_deviations: dict | None = None, use_parameter_bounds: bool | None = None, cost: Cost | None = None, gauss_newton: bool | None = None, options: dict | None = None)

Get the linear confidence intervals for the parameters based on the Chi-square distribution.

Parameters

parametersdict, optional

A dictionary mapping parameter names to their values. If provided, the parameters will be used to compute the confidence intervals. If not provided, the parameters from the DataFit will be used.

confidence_levelfloat, optional

The confidence level for the intervals, between 0 and 1. Default is 0.95.

variable_standard_deviationsdict, optional

A dictionary mapping variable names to their standard deviations. If provided, the cost function will be ignored.

use_parameter_boundsbool, optional

Whether to use the parameter bounds to clip the confidence intervals. Default is True.

costionworkspipeline.costs.Cost, optional

The cost function to use for the confidence intervals. If not provided, the cost function from the DataFit will be used.

gauss_newtonbool, optional

Whether to use the Gauss-Newton method to compute the confidence intervals. Default is True.

optionsdict, optional

Additional options for the numdifftools function.

property max_batch_size: int

Get the number of batches.

property multistarts: int | None

Get the number of multistarts.

property num_batches: int

Get the number of batches.

property num_workers: int

Get the number of worker processes.

property objective_function: ObjectiveFunction

Get the objective function.

plot_fit_results()

Plot the results of the fit by calling the plot_fit_results method of each internal callback in each objective. Any user-defined callbacks should be called manually instead of using this method.

Returns

dict

Dictionary containing figure and axes objects for each objective.

plot_sampler_results(confidence_level=None, chi2_minimum=None, burnin_iterations=None, show_bounds=None, bins=None)

Produces a pairwise plot of MCMC samples with histograms on the diagonal and scatter plots below.

Parameters

confidence_levelfloat, optional

The confidence level for filtering samples, between 0 and 1. Default is 0.95.

chi2_minimumfloat, optional

The minimum chi-square value to use as reference. If None, uses the minimum value from the sampler.

burnin_iterationsint, optional

Number of initial iterations to discard as burn-in. If None, uses the value set in the sampler.

show_boundsbool, optional

Whether to show parameter bounds on the plots. If None, defaults to True.

binsint, optional

Number of bins to use for histograms. If None, defaults to 20.

Returns

tuple

A tuple (fig, axes) containing the matplotlib Figure and array of Axes objects for the pairwise plots. The diagonal shows histograms of each parameter’s marginal distribution, while off-diagonal plots show 2D scatter plots of parameter pairs.

Raises

ValueError

If the fit hasn’t been run yet or if this method is called on an optimizer rather than a sampler.

plot_trace()

Plot the cost and each parameter as a function of iteration number.

Returns

tuple

Tuple containing figure and axes objects.

process_cost(cost: Cost | None, objectives: dict[str, Objective], optimizer: Optimizer) Cost

Process the cost function.

Parameters

costiwp.costs.Cost | None

The cost function to process.

objectivesdict[str, Objective]

The objectives to process.

optimizeriwp.optimizers.Optimizer

The optimizer to process.

Returns

iwp.costs.Cost

The processed cost function.

process_initial_guess_distributions()

Process the initial guess distributions for the parameters being fit.

process_objectives(objectives: Objective | dict[str, Objective]) dict[str, Objective]

Set up the objectives.

Parameters

objectivesObjective or dict[str, Objective]

The objectives to process.

Returns

dict[str, Objective]

The processed objectives.

process_optimizer(optimizer: Optimizer | None, objectives: dict[str, Objective]) Optimizer

Process the optimizer.

Parameters

optimizeriwp.optimizers.Optimizer | None

The optimizer to process.

objectivesdict[str, Objective]

The objectives to process.

Returns

iwp.optimizers.Optimizer

The processed optimizer.

run(parameter_values) Result

Run the optimization to fit the model to data.

Parameters

parameter_valuesdict

Dictionary of parameter values to use for the fit.

Returns

iwp.Result

Results object containing fitted parameters and optimization results.

sampler_confidence_intervals(confidence_level=None, chi2_minimum=None)

Calculate confidence intervals for sampled parameters based on chi-square thresholds.

Parameters

confidence_levelfloat, optional

The confidence level for the intervals, between 0 and 1. Default is 0.95.

chi2_minimumfloat, optional

The minimum chi-square value to use as reference. Default is the minimum value in results.

Returns

dict

Dictionary mapping parameter names to tuples of (lower_bound, upper_bound) confidence intervals.

Raises

ValueError

If the fit hasn’t been run, if using an optimizer instead of sampler, or if not using a chi-square cost function.

set_initial_guesses(initial_guesses: list[dict], initial_guess_sampler: iwp.stats.DistributionSampler | None = None)

Set up and validate initial parameter guesses for optimization.

This method: 1. Stores the provided initial guesses or sampler configuration 2. Validates that the initial guesses match the parameters being fit 3. Sets up sampling distributions if no explicit guesses are provided 4. Ensures uniqueness of guesses when using multistart optimization

Parameters

initial_guessesdict or list[dict] or None

User-provided initial guesses. If None, guesses will be sampled using the initial_guess_sampler. Each guess should be a dictionary mapping parameter names to their initial values.

initial_guess_samplerDistributionSampler or None

Sampler to generate initial guesses when not provided directly. If None and no initial_guesses provided, defaults to LatinHypercube.

Raises

ValueError

If initial guesses are invalid, non-unique for multistart optimization, or don’t match the parameters being fit.

timeseries_preprocessing()

Set up time stepping for the fit.

class ionworkspipeline.data_fits.ArrayDataFit(objectives, **kwargs)

A pipeline element that fits a model to data for multiple independent variable values. The data for each independent variable value is fitted separately. The independent variable values should be given as the keys of the objectives dictionary. The value of each key should be a ionworkspipeline.objectives.Objective object. This objective will be used to fit the data for the corresponding independent variable value.

The user-supplied objectives should assign the independent variable value to the custom_parameters attribute of the objective as appropriate. This class simply calls a separate ionworkspipeline.DataFit for each provided objective. It does not pass the independent variable value to the objective, so the user must ensure that the objective is set up to use the independent variable value correctly.

For example, this can be used to fit a model to data at multiple temperatures, or fit each pulse of a GITT experiment separately, with post-processing to extract functional relationships between parameters and the independent variable.

The rest of the parameters are the same as for ionworkspipeline.DataFit.

Extends: ionworkspipeline.data_fits.data_fit.DataFit

check_initial_guesses()

Check that initial guesses can be sampled for all data fits.

property data_fits

Get the dictionary of DataFit objects.

Returns

dict

Dictionary mapping independent variable values to DataFit objects

run(parameter_values)

Run the optimization for each independent variable value.

Parameters

parameter_valuesdict

Dictionary of parameter values to use for the fit

Returns

iwp.Result

Results object containing fitted parameters and optimization results

class ionworkspipeline.data_fits.CostLogger(plot_every=None, print_every=None, checkpoint_func=None)

A class to log the cost and parameters during a fit, and plot the results.

Parameters

plot_everyfloat, optional

The number of seconds between each plot of the cost and parameters. If None, no plots are generated during the fit, but the final plot can be generated using the plot() method.

print_everyfloat, optional

The number of seconds between each print of the cost and parameters. If None, no prints are generated during the fit, but the final print can be generated using the print() method.

checkpoint_funccallable, optional

A function to call to checkpoint the fit. The function should take a single argument, which will be a dictionary containing the current state of the fit.

argmin_costs()

Get the index of the job with minimum cost. If all costs are NaN, return 0.

Returns

int

Index of minimum cost

argsort_costs()

Sort the jobs in ascending order of cost.

Returns

numpy.ndarray

Indices that would sort the costs

property children

List of child CostLoggerJob objects.

clear_axes()

Clear the plot axes.

property cost

List of logged costs.

property fig_axes

Get or create figure and axes objects.

Returns

tuple

Figure and axes objects, and whether they existed

finish()

Finish logging.

property finished

Whether logging is finished.

get_log()

Get the logged data.

Returns

dict

Dictionary of logged values

property is_parent

Whether this is a parent logger.

property iteration

List of iteration numbers.

log(logs=None, job_id=None)

Log cost and parameter values.

Parameters

logsdict, optional

Dictionary of values to log

job_idint, optional

ID of the job being logged

property multiprocessing

Whether multiprocessing is enabled.

property num_jobs

Number of jobs being logged.

property parent

Parent CostLogger object.

plot()

Plot the cost and parameters.

Returns

tuple

Figure and axes objects

property plot_every

Seconds between plot updates.

property plot_flag

Plot update flag.

plot_refresh(force_plot=False)

Update the plot of cost and parameters.

Parameters

force_plotbool, optional

If True, force a plot update regardless of timing

Returns

tuple

Figure and axes objects

property plot_variables

List of variables to plot.

property print_every

Seconds between print updates.

property probabilistic

Whether probabilistic sampling is enabled.

reset()

Reset all logging data and counters.

set_datafit_attributes(datafit)

Set the DataFit attributes.

set_multiprocessing(multiprocessing)

Set whether multiprocessing is being used.

Parameters

multiprocessingbool

Whether multiprocessing is enabled

set_parameters(parameters)

Set the parameters to be logged.

Parameters

parameterslist

List of Parameter objects

set_probabilistic(probabilistic)

Set whether probabilistic sampling is enabled.

property show_plot_iterative

Whether to show plots during optimization.

property show_print_iterative

Whether to print updates during optimization.

spawn_children(num_jobs)

Create child CostLoggerJob objects.

Parameters

num_jobsint

Number of child loggers to create

Returns

list

List of CostLoggerJob objects

start()

Start logging.

property timer

Timer object.