Data Fits

class ionworkspipeline.data_fits.DataFit(objectives: Objective | dict[str, Objective], source: str = '', parameters: dict | None = None, cost: Cost | None = None, initial_guesses: dict | Result | DataFrame | None = None, optimizer: Optimizer | Sampler | None = None, cost_logger: CostLogger | None = None, multistarts: int | None = None, num_workers: int | None = None, max_batch_size: int | None = None, initial_guess_sampler: DistributionSampler | None = None, priors: Prior | list[Prior] | None = None, options: dict | None = None)

A pipeline element that fits a model to data.

Parameters

objectivesionworkspipeline.objectives.Objective or dict[str, ionworkspipeline.objectives.Objective]

The objective(s) to use for the fit. This can be a single objective, or a dictionary of objectives. Each objective can be any class that implements the method build, which creates a function Objective.run() that takes a dictionary of parameters and returns a scalar or vector cost. In general, we subclass ionworkspipeline.objectives.Objective to implement a particular objective.

sourcestr

A string describing the source of the data.

parametersdict, optional
A dictionary of parameters to fit. The values can be:
  • an iwp.Parameter object, e.g. iwp.Parameter(“x”)

  • a pybamm expression, in which case the other parameters should also be explicitly provided as iwp.Parameter objects, e.g.

    {

    “param”: 2 * pybamm.Parameter(“half-param”),

    “half-param”: iwp.Parameter(“half-param”)

    }

    works, but

    {“param”: 2 * iwp.Parameter(“half-param”)} would not work.

  • a function, containing other parameters, in which case the other parameters should again also be explicitly provided as iwp.Parameter objects, e.g.

    {

    “main parameter”: lambda x: pybamm.Parameter(“other parameter”) * x**2,

    “other parameter”: iwp.Parameter(“other parameter”)

    }

The name of the input parameter does not need to match the name of the parameter. In all cases, the DataFit class will automatically process this input to fit for “x”.

costionworkspipeline.costs.Cost, optional

The cost function to use when constructing the objective. If None, uses the optimizer’s default cost function.

initial_guessesdict or list of dicts, optional

Initial guesses for the parameters. If a single dictionary, then this is used as the initial guess for all optimization jobs in each batch. If a list of dictionaries, then each dictionary is used as the initial guess for a single job.

optimizerionworkspipeline.optimizers.Optimizer or ionworkspipeline.samplers.Sampler, optional

The optimizer to use for the fit. Default is set by the DataFit subclasses.

cost_loggerionworkspipeline.data_fits.CostLogger, optional

A cost logger to use for logging the cost and parameters during the fit. Default is iwp.data_fits.CostLogger with default options.

multistartsint, optional

Number of times to run the optimization from different initial guesses. If None, only runs once from the provided initial guess.

num_workersint, optional

Number of worker processes to use for parallel batch processing. Not supported on Windows. If num_workers = 1, then multiprocessing is disabled. If num_workers = None, then the number of workers is set to the number of CPU cores.

max_batch_sizeint, optional

Maximum number of optimization jobs to include in a single batch. If None, defaults to the largest possible batch size, which is the ceiling of the total number of jobs divided by the number of workers.

initial_guess_samplerionworkspipeline.data_fits.distribution_samplers.DistributionSampler, optional

Sampler to use for generating initial guesses of multistarted parameter estimations. Default is ionworkspipeline.data_fits.distribution_samplers.LatinHypercube.

priorsionworkspipeline.priors.Prior or list[ionworkspipeline.priors.Prior], optional

Priors to use for the fit.

optionsdict, optional

A dictionary of options to pass to the data fit. By default:

options = {
    # Random seed for reproducibility. Defaults to a random seed generated
    # determined by the current time.
    "seed": iwutil.random.generate_seed(),
}

Note: These options only have an effect if model.convert_to_format == ‘casadi’

Extends: ionworkspipeline.data_fits.data_fit._DataFitBase

approximate_model_distribution()

Get the multivariate normal approximation of the parameter space.

property batch_ids: list[int]

Get the batch IDs.

property batches: list[_DataFitBatch]

Get the list of optimization jobs.

check_initial_guesses()

Check that the initial parameter guesses are different for multistarts.

This method verifies that initial guesses differ across jobs, but ignores parameters with zero variance distributions (like PointMass distributions) when checking for uniqueness.

Raises

ValueError

If initial guesses are not unique or number of guesses does not match jobs.

compute_hessian(cost=None)

Compute the Hessian matrix of the objective function.

property data_fit_runner: _DataFitRunner | None

Get the data fit runner.

estimate_variable_standard_deviations()

Estimate the standard deviations of the variables.

property explicit_initial_guesses: bool

Get the initial guesses flag.

get_batch(batch_id: int) _DataFitBatch

Get the batch object for a given batch ID.

Each batch represents a group of optimization jobs that are processed together for improved efficiency. This method retrieves or creates a _DataFitBatch object that manages these jobs.

Parameters

batch_idint | None

Unique identifier for the batch to retrieve

Returns

_DataFitBatch

The batch object for the given ID

get_fit_results()

Get the results of the fit.

Returns

dict

Dictionary containing fit results for each objective.

property hessian

Get the Hessian matrix of the objective function.

property initial_guess_distributions: dict[str, Distribution]

Get the initial guess distributions for the parameters being fit.

property initial_guess_sampler: Sampler

Get the initial guess sampler.

property is_parent: bool

Whether the DataFit is a parent.

property job_ids: list[ndarray]

Get the job IDs.

linear_confidence_intervals(confidence_level=None, variable_standard_deviations=None, use_parameter_bounds=None)

Get the linear confidence intervals for the parameters.

property max_batch_size: int

Get the number of batches.

property multistarts: int | None

Get the number of multistarts.

property num_batches: int

Get the number of batches.

property num_workers: int

Get the number of worker processes.

property objective_function: ObjectiveFunction

Get the objective function.

plot_fit_results()

Plot the results of the fit by calling the plot_fit_results method of each internal callback in each objective. Any user-defined callbacks should be called manually instead of using this method.

Returns

dict

Dictionary containing figure and axes objects for each objective.

plot_sampler_results(confidence_level=None, chi2_minimum=None, burnin_iterations=None, show_bounds=None, bins=None)

Produces a pairwise plot of MCMC samples with histograms on the diagonal and scatter plots below.

Parameters

confidence_levelfloat, optional

The confidence level for filtering samples, between 0 and 1. Default is 0.95.

chi2_minimumfloat, optional

The minimum chi-square value to use as reference. If None, uses the minimum value from the sampler.

burnin_iterationsint, optional

Number of initial iterations to discard as burn-in. If None, uses the value set in the sampler.

show_boundsbool, optional

Whether to show parameter bounds on the plots. If None, defaults to True.

binsint, optional

Number of bins to use for histograms. If None, defaults to 20.

Returns

tuple

A tuple (fig, axes) containing the matplotlib Figure and array of Axes objects for the pairwise plots. The diagonal shows histograms of each parameter’s marginal distribution, while off-diagonal plots show 2D scatter plots of parameter pairs.

Raises

ValueError

If the fit hasn’t been run yet or if this method is called on an optimizer rather than a sampler.

plot_trace()

Plot the cost and each parameter as a function of iteration number.

Returns

tuple

Tuple containing figure and axes objects.

process_initial_guess_distributions()

Process the initial guess distributions for the parameters being fit.

process_objectives(objectives: Objective | dict[str, Objective])

Set up the objectives.

run(parameter_values)

Run the optimization to fit the model to data.

Parameters

parameter_valuesdict

Dictionary of parameter values to use for the fit.

Returns

iwp.Result

Results object containing fitted parameters and optimization results.

sampler_confidence_intervals(confidence_level=None, chi2_minimum=None)

Calculate confidence intervals for sampled parameters based on chi-square thresholds.

Parameters

confidence_levelfloat, optional

The confidence level for the intervals, between 0 and 1. Default is 0.95.

chi2_minimumfloat, optional

The minimum chi-square value to use as reference. Default is the minimum value in results.

Returns

dict

Dictionary mapping parameter names to tuples of (lower_bound, upper_bound) confidence intervals.

Raises

ValueError

If the fit hasn’t been run, if using an optimizer instead of sampler, or if not using a chi-square cost function.

set_initial_guesses(initial_guesses: list[dict], initial_guess_sampler: iwp.stats.DistributionSampler | None = None)

Set up and validate initial parameter guesses for optimization.

This method: 1. Stores the provided initial guesses or sampler configuration 2. Validates that the initial guesses match the parameters being fit 3. Sets up sampling distributions if no explicit guesses are provided 4. Ensures uniqueness of guesses when using multistart optimization

Parameters

initial_guessesdict or list[dict] or None

User-provided initial guesses. If None, guesses will be sampled using the initial_guess_sampler. Each guess should be a dictionary mapping parameter names to their initial values.

initial_guess_samplerDistributionSampler or None

Sampler to generate initial guesses when not provided directly. If None and no initial_guesses provided, defaults to LatinHypercube.

Raises

ValueError

If initial guesses are invalid, non-unique for multistart optimization, or don’t match the parameters being fit.

timeseries_preprocessing()

Set up time stepping for the fit.

class ionworkspipeline.data_fits.ArrayDataFit(objectives, **kwargs)

A pipeline element that fits a model to data for multiple independent variable values. The data for each independent variable value is fitted separately. The independent variable values should be given as the keys of the objectives dictionary. The value of each key should be a ionworkspipeline.objectives.Objective object. This objective will be used to fit the data for the corresponding independent variable value.

The user-supplied objectives should assign the independent variable value to the custom_parameters attribute of the objective as appropriate. This class simply calls a separate ionworkspipeline.DataFit for each provided objective. It does not pass the independent variable value to the objective, so the user must ensure that the objective is set up to use the independent variable value correctly.

For example, this can be used to fit a model to data at multiple temperatures, or fit each pulse of a GITT experiment separately, with post-processing to extract functional relationships between parameters and the independent variable.

The rest of the parameters are the same as for ionworkspipeline.DataFit.

Extends: ionworkspipeline.data_fits.data_fit.DataFit

check_initial_guesses()

Check that initial guesses can be sampled for all data fits.

property data_fits

Get the dictionary of DataFit objects.

Returns

dict

Dictionary mapping independent variable values to DataFit objects

run(parameter_values)

Run the optimization for each independent variable value.

Parameters

parameter_valuesdict

Dictionary of parameter values to use for the fit

Returns

iwp.Result

Results object containing fitted parameters and optimization results

class ionworkspipeline.data_fits.CostLogger(plot_every=None, print_every=None, checkpoint_func=None)

A class to log the cost and parameters during a fit, and plot the results.

Parameters

plot_everyfloat, optional

The number of seconds between each plot of the cost and parameters. If None, no plots are generated during the fit, but the final plot can be generated using the plot() method.

print_everyfloat, optional

The number of seconds between each print of the cost and parameters. If None, no prints are generated during the fit, but the final print can be generated using the print() method.

checkpoint_funccallable, optional

A function to call to checkpoint the fit. The function should take a single argument, which will be a dictionary containing the current state of the fit.

argmin_costs()

Get the index of the job with minimum cost. If all costs are NaN, return 0.

Returns

int

Index of minimum cost

argsort_costs()

Sort the jobs in ascending order of cost.

Returns

numpy.ndarray

Indices that would sort the costs

property children

List of child CostLoggerJob objects.

clear_axes()

Clear the plot axes.

property cost

List of logged costs.

property fig_axes

Get or create figure and axes objects.

Returns

tuple

Figure and axes objects, and whether they existed

finish()

Finish logging.

property finished

Whether logging is finished.

get_log()

Get the logged data.

Returns

dict

Dictionary of logged values

property is_parent

Whether this is a parent logger.

property iteration

List of iteration numbers.

log(logs=None, job_id=None)

Log cost and parameter values.

Parameters

logsdict, optional

Dictionary of values to log

job_idint, optional

ID of the job being logged

property multiprocessing

Whether multiprocessing is enabled.

property num_jobs

Number of jobs being logged.

property parent

Parent CostLogger object.

plot()

Plot the cost and parameters.

Returns

tuple

Figure and axes objects

property plot_every

Seconds between plot updates.

property plot_flag

Plot update flag.

plot_refresh(force_plot=False)

Update the plot of cost and parameters.

Parameters

force_plotbool, optional

If True, force a plot update regardless of timing

Returns

tuple

Figure and axes objects

property plot_variables

List of variables to plot.

property print_every

Seconds between print updates.

property probabilistic

Whether probabilistic sampling is enabled.

reset()

Reset all logging data and counters.

set_datafit_attributes(datafit)

Set the DataFit attributes.

set_multiprocessing(multiprocessing)

Set whether multiprocessing is being used.

Parameters

multiprocessingbool

Whether multiprocessing is enabled

set_parameters(parameters)

Set the parameters to be logged.

Parameters

parameterslist

List of Parameter objects

set_probabilistic(probabilistic)

Set whether probabilistic sampling is enabled.

property show_plot_iterative

Whether to show plots during optimization.

property show_print_iterative

Whether to print updates during optimization.

spawn_children(num_jobs)

Create child CostLoggerJob objects.

Parameters

num_jobsint

Number of child loggers to create

Returns

list

List of CostLoggerJob objects

start()

Start logging.

property timer

Timer object.