Data Fits¶

A pipeline element that fits a model to data.

Parameters¶

objectivesionworkspipeline.objectives.Objective or dict[str, ionworkspipeline.objectives.Objective]

The objective(s) to use for the fit. This can be a single objective, or a dictionary of objectives. Each objective can be any class that implements the method build, which creates a function Objective.run() that takes a dictionary of parameters and returns a scalar or vector cost. In general, we subclass ionworkspipeline.objectives.Objective to implement a particular objective.

sourcestr

A string describing the source of the data.

parametersdict, optional

A dictionary of parameters to fit. The values can be:

an iwp.Parameter object, e.g. iwp.Parameter(“x”)
a pybamm expression, in which case the other parameters should also be explicitly provided as iwp.Parameter objects, e.g.

{

“param”: 2 * pybamm.Parameter(“half-param”),

“half-param”: iwp.Parameter(“half-param”)

}

works, but

{“param”: 2 * iwp.Parameter(“half-param”)} would not work.
a function, containing other parameters, in which case the other parameters should again also be explicitly provided as iwp.Parameter objects, e.g.

{

“main parameter”: lambda x: pybamm.Parameter(“other parameter”) * x**2,

“other parameter”: iwp.Parameter(“other parameter”)

}

The name of the input parameter does not need to match the name of the parameter. In all cases, the DataFit class will automatically process this input to fit for “x”.

costionworkspipeline.costs.Cost, optional

The cost function to use when constructing the objective. If None, uses the optimizer’s default cost function.

initial_guessesdict or list of dicts, optional

Initial guesses for the parameters. If a single dictionary, then this is used as the initial guess for all optimization jobs in each batch. If a list of dictionaries, then each dictionary is used as the initial guess for a single job.

optimizerionworkspipeline.optimizers.Optimizer or ionworkspipeline.samplers.Sampler, optional

The optimizer to use for the fit. Default is set by the DataFit subclasses.

cost_loggerionworkspipeline.data_fits.CostLogger, optional

A cost logger to use for logging the cost and parameters during the fit. Default is iwp.data_fits.CostLogger with default options.

multistartsint, optional

Number of times to run the optimization from different initial guesses. If None, only runs once from the provided initial guess.

num_workersint, optional

Number of worker processes to use for parallel batch processing. Not supported on Windows. If num_workers = 1, then multiprocessing is disabled. If num_workers = None, then the number of workers is set to the number of CPU cores.

max_batch_sizeint, optional

Maximum number of optimization jobs to include in a single batch. If None, defaults to the largest possible batch size, which is the ceiling of the total number of jobs divided by the number of workers.

initial_guess_samplerionworkspipeline.data_fits.distribution_samplers.DistributionSampler, optional

Sampler to use for generating initial guesses of multistarted parameter estimations. Default is ionworkspipeline.data_fits.distribution_samplers.LatinHypercube.

priorsionworkspipeline.priors.Prior or list[ionworkspipeline.priors.Prior], optional

Priors to use for the fit.

optionsdict, optional

A dictionary of options to pass to the data fit. By default:

options = {
    # Random seed for reproducibility. Defaults to a random seed generated
    # determined by the current time.
    "seed": iwutil.random.generate_seed(),
    # Whether to reduce the size of the log. If True, only append logs if the
    # cost is at least 0.1% better than the best cost so far. Defaults
    # to True if the optimizer is deterministic.
    "low_memory": True,
}

Note: These options only have an effect if model.convert_to_format == ‘casadi’

Extends: ionworkspipeline.data_fits.data_fit._DataFitBase

approximate_model_distribution(parameters: dict | None = None, *args, remove_priors=None, **kwargs) → MultivariateNormal¶

Approximate the optimized model results as a linearized multivariate normal distribution.

Parameters¶

parametersdict, optional: Dictionary of parameter values to use for the approximation.
*argstuple, optional: Additional positional arguments to pass to the compute_parameter_covariance method.
remove_priorsbool, optional: Whether to remove the contribution of the priors. Default is False.
**kwargsdict, optional: Additional keyword arguments to pass to the compute_parameter_covariance method.

Returns¶

ionworkspipeline.stats.MultivariateNormal: The multivariate normal approximation of the optimized model results.

property batch_ids: list[int]¶: Get the batch IDs.

property batches: list[_DataFitBatch]¶: Get the list of optimization jobs.

check_initial_guesses()¶

Check that the initial parameter guesses are different for multistarts.

This method verifies that initial guesses differ across jobs, but ignores parameters with zero variance distributions (like PointMass distributions) when checking for uniqueness.

Raises¶

ValueError: If initial guesses are not unique or number of guesses does not match jobs.

compute_gradient(parameters: dict | None = None, cost: Cost | None = None, options: dict | None = None) → ndarray¶

Compute the gradient of the objective function.

Parameters¶

parametersdict, optional: Dictionary of parameter values to use for the gradient calculation.
costiwp.costs.Cost, optional: The cost to use for the gradient calculation. If None, uses the current cost.
optionsdict, optional: Additional keyword arguments to pass to the numdifftools finite difference method.

Returns¶

np.ndarray: The gradient of the objective function.

compute_hessian(parameters: dict | None = None, cost: Cost | None = None, gauss_newton: bool | None = None, options: dict | None = None) → ndarray¶

Compute the Hessian matrix of the objective function.

Parameters¶

parametersdict, optional: Dictionary of parameter values to use for the Hessian calculation. If None, uses DataFit.results.
costiwp.costs.Cost, optional: The cost to use for the Hessian calculation. If None, uses the current cost.
gauss_newtonbool, optional: Whether to use the Gauss-Newton method to compute the Hessian. If None, this method is enabled if the cost supports array output.
optionsdict, optional: Additional keyword arguments to pass to the numdifftools finite difference method.

Returns¶

np.ndarray: The Hessian matrix of the objective function.

compute_inverse_parameter_covariance(parameters: dict | None = None, *args, **kwargs) → ndarray¶

Compute the inverse parameter covariance matrix of the sum of squared residuals objective function, V_p^-1 = (residuals(p)^T * V_epsilon^-1 * residuals(p)) where V_epsilon is the model error covariance matrix.

Parameters¶

parametersdict, optional: Dictionary of parameter values to use for the Hessian calculation.
*argstuple, optional: Additional positional arguments to pass to the compute_hessian method.
**kwargsdict, optional: Additional keyword arguments to pass to the compute_hessian method.

Returns¶

np.ndarray: The parameter covariance matrix of the objective function.

compute_jacobian(parameters: dict | None = None, cost: Cost | None = None, options: dict | None = None) → ndarray¶

Compute the Jacobian matrix of the objective function.

Parameters¶

parametersdict, optional: Dictionary of parameter values to use for the Jacobian calculation.
costiwp.costs.Cost, optional: The cost to use for the Jacobian calculation. If None, uses the current cost.
optionsdict, optional: Additional keyword arguments to pass to the numdifftools finite difference method.

Returns¶

np.ndarray: The Jacobian matrix of the objective function.

compute_parameter_covariance(*args, **kwargs) → ndarray¶

Compute the parameter covariance matrix of the sum of squared residuals objective function, (residuals(p)^T * V_epsilon^-1 * residuals(p))^-1 where V_epsilon is the model error covariance matrix.

Parameters¶

parametersdict, optional: Dictionary of parameter values to use for the Hessian calculation.
*argstuple, optional: Additional positional arguments to pass to the compute_hessian method.
**kwargsdict, optional: Additional keyword arguments to pass to the compute_hessian method.

Returns¶

np.ndarray: The parameter covariance matrix of the objective function.

compute_residuals(parameters: dict | None = None, cost: Cost | None = None) → ndarray¶

Compute the residuals of the objective function.

Parameters¶

parametersdict, optional: Dictionary of parameter values to use for the residuals calculation.
costiwp.costs.Cost, optional: The cost to use for the residuals calculation. If None, uses the current cost.

Returns¶

np.ndarray: The residuals of the objective function.

property data_fit_runner: _DataFitRunner | None¶: Get the data fit runner.

estimate_variable_standard_deviations(parameters: dict | None = None) → dict[str, float]¶: Estimate the standard deviations of the variables.

property explicit_initial_guesses: bool¶: Get the initial guesses flag.

get_batch(batch_id: int) → _DataFitBatch¶

Get the batch object for a given batch ID.

Each batch represents a group of optimization jobs that are processed together for improved efficiency. This method retrieves or creates a _DataFitBatch object that manages these jobs.

Parameters¶

batch_idint | None: Unique identifier for the batch to retrieve

Returns¶

_DataFitBatch: The batch object for the given ID

get_fit_results()¶

Get the results of the fit.

Returns¶

dict: Dictionary containing fit results for each objective.

property initial_guess_distributions: dict[str, Distribution]¶: Get the initial guess distributions for the parameters being fit.

property initial_guess_sampler: Sampler¶: Get the initial guess sampler.

property is_parent: bool¶: Whether the DataFit is a parent.

property job_ids: list[ndarray]¶: Get the job IDs.

Get the linear confidence intervals for the parameters based on the Chi-square distribution.

Parameters¶

parametersdict, optional: A dictionary mapping parameter names to their values. If provided, the parameters will be used to compute the confidence intervals. If not provided, the parameters from the DataFit will be used.
confidence_levelfloat, optional: The confidence level for the intervals, between 0 and 1. Default is 0.95.
variable_standard_deviationsdict, optional: A dictionary mapping variable names to their standard deviations. If provided, the cost function will be ignored.
use_parameter_boundsbool, optional: Whether to use the parameter bounds to clip the confidence intervals. Default is True.
costionworkspipeline.costs.Cost, optional: The cost function to use for the confidence intervals. If not provided, the cost function from the DataFit will be used.
gauss_newtonbool, optional: Whether to use the Gauss-Newton method to compute the confidence intervals. Default is True.
optionsdict, optional: Additional options for the numdifftools function.

property max_batch_size: int¶: Get the number of batches.

property multistarts: int | None¶: Get the number of multistarts.

property num_batches: int¶: Get the number of batches.

property num_workers: int¶: Get the number of worker processes.

property objective_function: ObjectiveFunction¶: Get the objective function.

plot_fit_results()¶

Plot the results of the fit by calling the plot_fit_results method of each internal callback in each objective. Any user-defined callbacks should be called manually instead of using this method.

Returns¶

dict: Dictionary containing figure and axes objects for each objective.

plot_sampler_results(confidence_level=None, chi2_minimum=None, burnin_iterations=None, show_bounds=None, bins=None)¶

Produces a pairwise plot of MCMC samples with histograms on the diagonal and scatter plots below.

Parameters¶

confidence_levelfloat, optional: The confidence level for filtering samples, between 0 and 1. Default is 0.95.
chi2_minimumfloat, optional: The minimum chi-square value to use as reference. If None, uses the minimum value from the sampler.
burnin_iterationsint, optional: Number of initial iterations to discard as burn-in. If None, uses the value set in the sampler.
show_boundsbool, optional: Whether to show parameter bounds on the plots. If None, defaults to True.
binsint, optional: Number of bins to use for histograms. If None, defaults to 20.

Returns¶

tuple: A tuple (fig, axes) containing the matplotlib Figure and array of Axes objects for the pairwise plots. The diagonal shows histograms of each parameter’s marginal distribution, while off-diagonal plots show 2D scatter plots of parameter pairs.

Raises¶

ValueError: If the fit hasn’t been run yet or if this method is called on an optimizer rather than a sampler.

plot_trace()¶

Plot the cost and each parameter as a function of iteration number.

Returns¶

tuple: Tuple containing figure and axes objects.

process_cost(cost: Cost | None, objectives: dict[str, Objective], optimizer: Optimizer) → Cost¶

Process the cost function.

Parameters¶

costiwp.costs.Cost | None: The cost function to process.
objectivesdict[str, Objective]: The objectives to process.
optimizeriwp.optimizers.Optimizer: The optimizer to process.

Returns¶

iwp.costs.Cost: The processed cost function.

process_initial_guess_distributions()¶: Process the initial guess distributions for the parameters being fit.

process_objectives(objectives: Objective | dict[str, Objective]) → dict[str, Objective]¶

Set up the objectives.

Parameters¶

objectivesObjective or dict[str, Objective]: The objectives to process.

Returns¶

dict[str, Objective]: The processed objectives.

process_optimizer(optimizer: Optimizer | None, objectives: dict[str, Objective]) → Optimizer¶

Process the optimizer.

Parameters¶

optimizeriwp.optimizers.Optimizer | None: The optimizer to process.
objectivesdict[str, Objective]: The objectives to process.

Returns¶

iwp.optimizers.Optimizer: The processed optimizer.

run(parameter_values) → Result¶

Run the optimization to fit the model to data.

Parameters¶

parameter_valuesdict: Dictionary of parameter values to use for the fit.

Returns¶

iwp.Result: Results object containing fitted parameters and optimization results.

sampler_confidence_intervals(confidence_level=None, chi2_minimum=None)¶

Calculate confidence intervals for sampled parameters based on chi-square thresholds.

Parameters¶

confidence_levelfloat, optional: The confidence level for the intervals, between 0 and 1. Default is 0.95.
chi2_minimumfloat, optional: The minimum chi-square value to use as reference. Default is the minimum value in results.

Returns¶

dict: Dictionary mapping parameter names to tuples of (lower_bound, upper_bound) confidence intervals.

Raises¶

ValueError: If the fit hasn’t been run, if using an optimizer instead of sampler, or if not using a chi-square cost function.

set_initial_guesses(initial_guesses: list[dict], initial_guess_sampler: iwp.stats.DistributionSampler | None = None)¶

Set up and validate initial parameter guesses for optimization.

This method: 1. Stores the provided initial guesses or sampler configuration 2. Validates that the initial guesses match the parameters being fit 3. Sets up sampling distributions if no explicit guesses are provided 4. Ensures uniqueness of guesses when using multistart optimization

Parameters¶

initial_guessesdict or list[dict] or None: User-provided initial guesses. If None, guesses will be sampled using the initial_guess_sampler. Each guess should be a dictionary mapping parameter names to their initial values.
initial_guess_samplerDistributionSampler or None: Sampler to generate initial guesses when not provided directly. If None and no initial_guesses provided, defaults to LatinHypercube.

Raises¶

ValueError: If initial guesses are invalid, non-unique for multistart optimization, or don’t match the parameters being fit.

timeseries_preprocessing()¶: Set up time stepping for the fit.

class ionworkspipeline.data_fits.ArrayDataFit(objectives, **kwargs)¶

A pipeline element that fits a model to data for multiple independent variable values. The data for each independent variable value is fitted separately. The independent variable values should be given as the keys of the objectives dictionary. The value of each key should be a ionworkspipeline.objectives.Objective object. This objective will be used to fit the data for the corresponding independent variable value.

The user-supplied objectives should assign the independent variable value to the custom_parameters attribute of the objective as appropriate. This class simply calls a separate ionworkspipeline.DataFit for each provided objective. It does not pass the independent variable value to the objective, so the user must ensure that the objective is set up to use the independent variable value correctly.

For example, this can be used to fit a model to data at multiple temperatures, or fit each pulse of a GITT experiment separately, with post-processing to extract functional relationships between parameters and the independent variable.

The rest of the parameters are the same as for ionworkspipeline.DataFit.

Extends: ionworkspipeline.data_fits.data_fit.DataFit

check_initial_guesses()¶: Check that initial guesses can be sampled for all data fits.

property data_fits¶

Get the dictionary of DataFit objects.

Returns¶

dict: Dictionary mapping independent variable values to DataFit objects

run(parameter_values)¶

Run the optimization for each independent variable value.

Parameters¶

parameter_valuesdict: Dictionary of parameter values to use for the fit

Returns¶

iwp.Result: Results object containing fitted parameters and optimization results

class ionworkspipeline.data_fits.CostLogger(plot_every=None, print_every=None, checkpoint_func=None)¶

A class to log the cost and parameters during a fit, and plot the results.

Parameters¶

plot_everyfloat, optional: The number of seconds between each plot of the cost and parameters. If None, no plots are generated during the fit, but the final plot can be generated using the plot() method.
print_everyfloat, optional: The number of seconds between each print of the cost and parameters. If None, no prints are generated during the fit, but the final print can be generated using the print() method.
checkpoint_funccallable, optional: A function to call to checkpoint the fit. The function should take a single argument, which will be a dictionary containing the current state of the fit.

argmin_costs()¶

Get the index of the job with minimum cost. If all costs are NaN, return 0.

Returns¶

int: Index of minimum cost

argsort_costs()¶

Sort the jobs in ascending order of cost.

Returns¶

numpy.ndarray: Indices that would sort the costs

property children¶: List of child CostLoggerJob objects.

clear_axes()¶: Clear the plot axes.

property cost¶: List of logged costs.

property fig_axes¶

Get or create figure and axes objects.

Returns¶

tuple: Figure and axes objects, and whether they existed

finish()¶: Finish logging.

property finished¶: Whether logging is finished.

get_log()¶

Get the logged data. If low_memory is enabled, only iterations that improve upon the previous iteration by 1% are logged. Otherwise, all iterations are logged.

Returns¶

dict: Dictionary of logged values

property is_parent¶: Whether this is a parent logger.

property iteration¶: List of iteration numbers.

log(logs=None, job_id=None)¶

Log cost and parameter values.

Parameters¶

logsdict, optional: Dictionary of values to log
job_idint, optional: ID of the job being logged

property low_memory¶: Whether low memory is being used.

property multiprocessing¶: Whether multiprocessing is enabled.

property num_jobs¶: Number of jobs being logged.

property parent¶: Parent CostLogger object.

plot()¶

Plot the cost and parameters.

Returns¶

tuple: Figure and axes objects

property plot_every¶: Seconds between plot updates.

property plot_flag¶: Plot update flag.

plot_refresh(force_plot=False)¶

Update the plot of cost and parameters.

Parameters¶

force_plotbool, optional: If True, force a plot update regardless of timing

Returns¶

tuple: Figure and axes objects

property plot_variables¶: List of variables to plot.

property print_every¶: Seconds between print updates.

property probabilistic¶: Whether probabilistic sampling is enabled.

reset()¶: Reset all logging data and counters.

set_datafit_attributes(datafit)¶: Set the DataFit attributes.

set_low_memory(low_memory)¶: Set whether low memory is being used.

set_multiprocessing(multiprocessing)¶

Set whether multiprocessing is being used.

Parameters¶

multiprocessingbool: Whether multiprocessing is enabled

set_parameters(parameters)¶

Set the parameters to be logged.

Parameters¶

parameterslist: List of Parameter objects

set_probabilistic(probabilistic)¶: Set whether probabilistic sampling is enabled.

property show_plot_iterative¶: Whether to show plots during optimization.

property show_print_iterative¶: Whether to print updates during optimization.

spawn_children(num_jobs)¶

Create child CostLoggerJob objects.

Parameters¶

num_jobsint: Number of child loggers to create

Returns¶

list: List of CostLoggerJob objects

static staircase_running_argmin(iterations_array: ndarray, cost_array: ndarray, iteration_counter: int)¶

Get indices for plotting staircase of running minimum.

Parameters¶

iterations_arrayarray_like: Array of iteration numbers
cost_arrayarray_like: Array of costs
iteration_counterint: The current iteration counter

Returns¶

tuple: x indices and minimum indices for plotting

start()¶: Start logging.

property timer¶: Timer object.

Data Fits¶

Parameters¶

Parameters¶

Returns¶

Raises¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Raises¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Raises¶

Parameters¶

Raises¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Returns¶

Returns¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Parameters¶

Parameters¶

Returns¶

Parameters¶

Returns¶

This Page