Data Fits¶

A pipeline element that fits a model to data.

Parameters¶

objectivesionworkspipeline.objectives.Objective or dict[str, ionworkspipeline.objectives.Objective]

The objective(s) to use for the fit. This can be a single objective, or a dictionary of objectives. Each objective can be any class that implements the method build, which creates a function Objective.run() that takes a dictionary of parameters and returns a scalar or vector cost. In general, we subclass ionworkspipeline.objectives.Objective to implement a particular objective.

sourcestr

A string describing the source of the data.

parametersdict, optional

A dictionary of parameters to fit. The values can be:

an iwp.Parameter object, e.g. iwp.Parameter(“x”)
a pybamm expression, in which case the other parameters should also be explicitly provided as iwp.Parameter objects, e.g.

{

“param”: 2 * pybamm.Parameter(“half-param”),

“half-param”: iwp.Parameter(“half-param”)

}

works, but

{“param”: 2 * iwp.Parameter(“half-param”)} would not work.
a function, containing other parameters, in which case the other parameters should again also be explicitly provided as iwp.Parameter objects, e.g.

{

“main parameter”: lambda x: pybamm.Parameter(“other parameter”) * x**2,

“other parameter”: iwp.Parameter(“other parameter”)

}

The name of the input parameter does not need to match the name of the parameter. In all cases, the DataFit class will automatically process this input to fit for “x”.

costionworkspipeline.costs.Cost, optional

The cost function to use when constructing the objective. If None, uses the optimizer’s default cost function.

initial_guessesdict or list of dicts, optional

Initial guesses for the parameters. If a single dictionary, then this is used as the initial guess for all optimization jobs in each batch. If a list of dictionaries, then each dictionary is used as the initial guess for a single job.

optimizerionworkspipeline.optimizers.Optimizer or ionworkspipeline.samplers.Sampler, optional

The optimizer to use for the fit. Default is set by the DataFit subclasses.

cost_loggerionworkspipeline.data_fits.CostLogger, optional

A cost logger to use for logging the cost and parameters during the fit. Default is iwp.data_fits.CostLogger with default options.

multistartsint, optional

Number of times to run the optimization from different initial guesses. If None, only runs once from the provided initial guess.

num_workersint, optional

Number of worker processes to use for parallel batch processing. Not supported on Windows. If num_workers = 1, then multiprocessing is disabled. If num_workers = None, then the number of workers is set to the number of CPU cores.

max_batch_sizeint, optional

Maximum number of optimization jobs to include in a single batch. If None, defaults to the largest possible batch size, which is the ceiling of the total number of jobs divided by the number of workers.

initial_guess_samplerionworkspipeline.data_fits.distribution_samplers.DistributionSampler, optional

Sampler to use for generating initial guesses of multistarted parameter estimations. Default is ionworkspipeline.data_fits.distribution_samplers.LatinHypercube.

priorsionworkspipeline.priors.Prior or list[ionworkspipeline.priors.Prior], optional

Priors to use for the fit.

optionsdict, optional

A dictionary of options to pass to the data fit. By default:

options = {
    # Random seed for reproducibility. Defaults to a random seed generated
    # determined by the current time.
    "seed": iwutil.random.generate_seed(),
}

Note: These options only have an effect if model.convert_to_format == ‘casadi’

Extends: ionworkspipeline.data_fits.data_fit._DataFitBase

approximate_model_distribution()¶: Get the multivariate normal approximation of the parameter space.

property batch_ids: list[int]¶: Get the batch IDs.

property batches: list[_DataFitBatch]¶: Get the list of optimization jobs.

check_initial_guesses()¶

Check that the initial parameter guesses are different for multistarts.

This method verifies that initial guesses differ across jobs, but ignores parameters with zero variance distributions (like PointMass distributions) when checking for uniqueness.

Raises¶

ValueError: If initial guesses are not unique or number of guesses does not match jobs.

compute_hessian(cost=None)¶: Compute the Hessian matrix of the objective function.

property data_fit_runner: _DataFitRunner | None¶: Get the data fit runner.

estimate_variable_standard_deviations()¶: Estimate the standard deviations of the variables.

property explicit_initial_guesses: bool¶: Get the initial guesses flag.

get_batch(batch_id: int) → _DataFitBatch¶

Get the batch object for a given batch ID.

Each batch represents a group of optimization jobs that are processed together for improved efficiency. This method retrieves or creates a _DataFitBatch object that manages these jobs.

Parameters¶

batch_idint | None: Unique identifier for the batch to retrieve

Returns¶

_DataFitBatch: The batch object for the given ID

get_fit_results()¶

Get the results of the fit.

Returns¶

dict: Dictionary containing fit results for each objective.

property hessian¶: Get the Hessian matrix of the objective function.

property initial_guess_distributions: dict[str, Distribution]¶: Get the initial guess distributions for the parameters being fit.

property initial_guess_sampler: Sampler¶: Get the initial guess sampler.

property is_parent: bool¶: Whether the DataFit is a parent.

property job_ids: list[ndarray]¶: Get the job IDs.

linear_confidence_intervals(confidence_level=None, variable_standard_deviations=None, use_parameter_bounds=None)¶: Get the linear confidence intervals for the parameters.

property max_batch_size: int¶: Get the number of batches.

property multistarts: int | None¶: Get the number of multistarts.

property num_batches: int¶: Get the number of batches.

property num_workers: int¶: Get the number of worker processes.

property objective_function: ObjectiveFunction¶: Get the objective function.

plot_fit_results()¶

Plot the results of the fit by calling the plot_fit_results method of each internal callback in each objective. Any user-defined callbacks should be called manually instead of using this method.

Returns¶

dict: Dictionary containing figure and axes objects for each objective.

plot_sampler_results(confidence_level=None, chi2_minimum=None, burnin_iterations=None, show_bounds=None, bins=None)¶

Produces a pairwise plot of MCMC samples with histograms on the diagonal and scatter plots below.

Parameters¶

confidence_levelfloat, optional: The confidence level for filtering samples, between 0 and 1. Default is 0.95.
chi2_minimumfloat, optional: The minimum chi-square value to use as reference. If None, uses the minimum value from the sampler.
burnin_iterationsint, optional: Number of initial iterations to discard as burn-in. If None, uses the value set in the sampler.
show_boundsbool, optional: Whether to show parameter bounds on the plots. If None, defaults to True.
binsint, optional: Number of bins to use for histograms. If None, defaults to 20.

Returns¶

tuple: A tuple (fig, axes) containing the matplotlib Figure and array of Axes objects for the pairwise plots. The diagonal shows histograms of each parameter’s marginal distribution, while off-diagonal plots show 2D scatter plots of parameter pairs.

Raises¶

ValueError: If the fit hasn’t been run yet or if this method is called on an optimizer rather than a sampler.

plot_trace()¶

Plot the cost and each parameter as a function of iteration number.

Returns¶

tuple: Tuple containing figure and axes objects.

process_initial_guess_distributions()¶: Process the initial guess distributions for the parameters being fit.

process_objectives(objectives: Objective | dict[str, Objective])¶: Set up the objectives.

run(parameter_values)¶

Run the optimization to fit the model to data.

Parameters¶

parameter_valuesdict: Dictionary of parameter values to use for the fit.

Returns¶

iwp.Result: Results object containing fitted parameters and optimization results.

sampler_confidence_intervals(confidence_level=None, chi2_minimum=None)¶

Calculate confidence intervals for sampled parameters based on chi-square thresholds.

Parameters¶

confidence_levelfloat, optional: The confidence level for the intervals, between 0 and 1. Default is 0.95.
chi2_minimumfloat, optional: The minimum chi-square value to use as reference. Default is the minimum value in results.

Returns¶

dict: Dictionary mapping parameter names to tuples of (lower_bound, upper_bound) confidence intervals.

Raises¶

ValueError: If the fit hasn’t been run, if using an optimizer instead of sampler, or if not using a chi-square cost function.

set_initial_guesses(initial_guesses: list[dict], initial_guess_sampler: iwp.stats.DistributionSampler | None = None)¶

Set up and validate initial parameter guesses for optimization.

This method: 1. Stores the provided initial guesses or sampler configuration 2. Validates that the initial guesses match the parameters being fit 3. Sets up sampling distributions if no explicit guesses are provided 4. Ensures uniqueness of guesses when using multistart optimization

Parameters¶

initial_guessesdict or list[dict] or None: User-provided initial guesses. If None, guesses will be sampled using the initial_guess_sampler. Each guess should be a dictionary mapping parameter names to their initial values.
initial_guess_samplerDistributionSampler or None: Sampler to generate initial guesses when not provided directly. If None and no initial_guesses provided, defaults to LatinHypercube.

Raises¶

ValueError: If initial guesses are invalid, non-unique for multistart optimization, or don’t match the parameters being fit.

timeseries_preprocessing()¶: Set up time stepping for the fit.

class ionworkspipeline.data_fits.ArrayDataFit(objectives, **kwargs)¶

A pipeline element that fits a model to data for multiple independent variable values. The data for each independent variable value is fitted separately. The independent variable values should be given as the keys of the objectives dictionary. The value of each key should be a ionworkspipeline.objectives.Objective object. This objective will be used to fit the data for the corresponding independent variable value.

The user-supplied objectives should assign the independent variable value to the custom_parameters attribute of the objective as appropriate. This class simply calls a separate ionworkspipeline.DataFit for each provided objective. It does not pass the independent variable value to the objective, so the user must ensure that the objective is set up to use the independent variable value correctly.

For example, this can be used to fit a model to data at multiple temperatures, or fit each pulse of a GITT experiment separately, with post-processing to extract functional relationships between parameters and the independent variable.

The rest of the parameters are the same as for ionworkspipeline.DataFit.

Extends: ionworkspipeline.data_fits.data_fit.DataFit

check_initial_guesses()¶: Check that initial guesses can be sampled for all data fits.

property data_fits¶

Get the dictionary of DataFit objects.

Returns¶

dict: Dictionary mapping independent variable values to DataFit objects

run(parameter_values)¶

Run the optimization for each independent variable value.

Parameters¶

parameter_valuesdict: Dictionary of parameter values to use for the fit

Returns¶

iwp.Result: Results object containing fitted parameters and optimization results

class ionworkspipeline.data_fits.CostLogger(plot_every=None, print_every=None, checkpoint_func=None)¶

A class to log the cost and parameters during a fit, and plot the results.

Parameters¶

plot_everyfloat, optional: The number of seconds between each plot of the cost and parameters. If None, no plots are generated during the fit, but the final plot can be generated using the plot() method.
print_everyfloat, optional: The number of seconds between each print of the cost and parameters. If None, no prints are generated during the fit, but the final print can be generated using the print() method.
checkpoint_funccallable, optional: A function to call to checkpoint the fit. The function should take a single argument, which will be a dictionary containing the current state of the fit.

argmin_costs()¶

Get the index of the job with minimum cost. If all costs are NaN, return 0.

Returns¶

int: Index of minimum cost

argsort_costs()¶

Sort the jobs in ascending order of cost.

Returns¶

numpy.ndarray: Indices that would sort the costs

property children¶: List of child CostLoggerJob objects.

clear_axes()¶: Clear the plot axes.

property cost¶: List of logged costs.

property fig_axes¶

Get or create figure and axes objects.

Returns¶

tuple: Figure and axes objects, and whether they existed

finish()¶: Finish logging.

property finished¶: Whether logging is finished.

get_log()¶

Get the logged data.

Returns¶

dict: Dictionary of logged values

property is_parent¶: Whether this is a parent logger.

property iteration¶: List of iteration numbers.

log(logs=None, job_id=None)¶

Log cost and parameter values.

Parameters¶

logsdict, optional: Dictionary of values to log
job_idint, optional: ID of the job being logged

property multiprocessing¶: Whether multiprocessing is enabled.

property num_jobs¶: Number of jobs being logged.

property parent¶: Parent CostLogger object.

plot()¶

Plot the cost and parameters.

Returns¶

tuple: Figure and axes objects

property plot_every¶: Seconds between plot updates.

property plot_flag¶: Plot update flag.

plot_refresh(force_plot=False)¶

Update the plot of cost and parameters.

Parameters¶

force_plotbool, optional: If True, force a plot update regardless of timing

Returns¶

tuple: Figure and axes objects

property plot_variables¶: List of variables to plot.

property print_every¶: Seconds between print updates.

property probabilistic¶: Whether probabilistic sampling is enabled.

reset()¶: Reset all logging data and counters.

set_datafit_attributes(datafit)¶: Set the DataFit attributes.

set_multiprocessing(multiprocessing)¶

Set whether multiprocessing is being used.

Parameters¶

multiprocessingbool: Whether multiprocessing is enabled

set_parameters(parameters)¶

Set the parameters to be logged.

Parameters¶

parameterslist: List of Parameter objects

set_probabilistic(probabilistic)¶: Set whether probabilistic sampling is enabled.

property show_plot_iterative¶: Whether to show plots during optimization.

property show_print_iterative¶: Whether to print updates during optimization.

spawn_children(num_jobs)¶

Create child CostLoggerJob objects.

Parameters¶

num_jobsint: Number of child loggers to create

Returns¶

list: List of CostLoggerJob objects

start()¶: Start logging.

property timer¶: Timer object.

Data Fits¶

Parameters¶

Raises¶

Parameters¶

Returns¶

Returns¶

Returns¶

Parameters¶

Returns¶

Raises¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Raises¶

Parameters¶

Raises¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Returns¶

Returns¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Parameters¶

Parameters¶

Returns¶

This Page