Data Fits¶
- class ionworkspipeline.data_fits.DataFit(objectives: Objective | dict[str, Objective], source: str = '', parameters: dict | None = None, cost: Cost | None = None, initial_guesses: dict | Result | DataFrame | None = None, optimizer: Optimizer | Sampler | None = None, cost_logger: CostLogger | None = None, multistarts: int | None = None, num_workers: int | None = None, max_batch_size: int | None = None, initial_guess_sampler: DistributionSampler | None = None, priors: Prior | list[Prior] | None = None, options: dict | None = None)¶
A pipeline element that fits a model to data.
Parameters¶
- objectives
ionworkspipeline.objectives.Objective
or dict[str,ionworkspipeline.objectives.Objective
] The objective(s) to use for the fit. This can be a single objective, or a dictionary of objectives. Each objective can be any class that implements the method build, which creates a function
Objective.run()
that takes a dictionary of parameters and returns a scalar or vector cost. In general, we subclassionworkspipeline.objectives.Objective
to implement a particular objective.- sourcestr
A string describing the source of the data.
- parametersdict, optional
- A dictionary of parameters to fit. The values can be:
an iwp.Parameter object, e.g. iwp.Parameter(“x”)
a pybamm expression, in which case the other parameters should also be explicitly provided as iwp.Parameter objects, e.g.
{
“param”: 2 * pybamm.Parameter(“half-param”),
“half-param”: iwp.Parameter(“half-param”)
}
works, but
{“param”: 2 * iwp.Parameter(“half-param”)} would not work.
a function, containing other parameters, in which case the other parameters should again also be explicitly provided as iwp.Parameter objects, e.g.
{
“main parameter”: lambda x: pybamm.Parameter(“other parameter”) * x**2,
“other parameter”: iwp.Parameter(“other parameter”)
}
The name of the input parameter does not need to match the name of the parameter. In all cases, the DataFit class will automatically process this input to fit for “x”.
- cost
ionworkspipeline.costs.Cost
, optional The cost function to use when constructing the objective. If None, uses the optimizer’s default cost function.
- initial_guessesdict or list of dicts, optional
Initial guesses for the parameters. If a single dictionary, then this is used as the initial guess for all optimization jobs in each batch. If a list of dictionaries, then each dictionary is used as the initial guess for a single job.
- optimizer
ionworkspipeline.optimizers.Optimizer
orionworkspipeline.samplers.Sampler
, optional The optimizer to use for the fit. Default is set by the DataFit subclasses.
- cost_logger
ionworkspipeline.data_fits.CostLogger
, optional A cost logger to use for logging the cost and parameters during the fit. Default is
iwp.data_fits.CostLogger
with default options.- multistartsint, optional
Number of times to run the optimization from different initial guesses. If None, only runs once from the provided initial guess.
- num_workersint, optional
Number of worker processes to use for parallel batch processing. Not supported on Windows. If num_workers = 1, then multiprocessing is disabled. If num_workers = None, then the number of workers is set to the number of CPU cores.
- max_batch_sizeint, optional
Maximum number of optimization jobs to include in a single batch. If None, defaults to the largest possible batch size, which is the ceiling of the total number of jobs divided by the number of workers.
- initial_guess_sampler
ionworkspipeline.data_fits.distribution_samplers.DistributionSampler
, optional Sampler to use for generating initial guesses of multistarted parameter estimations. Default is
ionworkspipeline.data_fits.distribution_samplers.LatinHypercube
.- priors
ionworkspipeline.priors.Prior
or list[ionworkspipeline.priors.Prior
], optional Priors to use for the fit.
- optionsdict, optional
A dictionary of options to pass to the data fit. By default:
options = { # Random seed for reproducibility. Defaults to a random seed generated # determined by the current time. "seed": iwutil.random.generate_seed(), }
Note: These options only have an effect if model.convert_to_format == ‘casadi’
Extends:
ionworkspipeline.data_fits.data_fit._DataFitBase
- approximate_model_distribution()¶
Get the multivariate normal approximation of the parameter space.
- property batch_ids: list[int]¶
Get the batch IDs.
- property batches: list[_DataFitBatch]¶
Get the list of optimization jobs.
- check_initial_guesses()¶
Check that the initial parameter guesses are different for multistarts.
This method verifies that initial guesses differ across jobs, but ignores parameters with zero variance distributions (like PointMass distributions) when checking for uniqueness.
Raises¶
- ValueError
If initial guesses are not unique or number of guesses does not match jobs.
- compute_hessian(cost=None)¶
Compute the Hessian matrix of the objective function.
- property data_fit_runner: _DataFitRunner | None¶
Get the data fit runner.
- estimate_variable_standard_deviations()¶
Estimate the standard deviations of the variables.
- property explicit_initial_guesses: bool¶
Get the initial guesses flag.
- get_batch(batch_id: int) _DataFitBatch ¶
Get the batch object for a given batch ID.
Each batch represents a group of optimization jobs that are processed together for improved efficiency. This method retrieves or creates a _DataFitBatch object that manages these jobs.
Parameters¶
- batch_idint | None
Unique identifier for the batch to retrieve
Returns¶
- _DataFitBatch
The batch object for the given ID
- get_fit_results()¶
Get the results of the fit.
Returns¶
- dict
Dictionary containing fit results for each objective.
- property hessian¶
Get the Hessian matrix of the objective function.
- property initial_guess_distributions: dict[str, Distribution]¶
Get the initial guess distributions for the parameters being fit.
- property is_parent: bool¶
Whether the DataFit is a parent.
- property job_ids: list[ndarray]¶
Get the job IDs.
- linear_confidence_intervals(confidence_level=None, variable_standard_deviations=None, use_parameter_bounds=None)¶
Get the linear confidence intervals for the parameters.
- property max_batch_size: int¶
Get the number of batches.
- property multistarts: int | None¶
Get the number of multistarts.
- property num_batches: int¶
Get the number of batches.
- property num_workers: int¶
Get the number of worker processes.
- property objective_function: ObjectiveFunction¶
Get the objective function.
- plot_fit_results()¶
Plot the results of the fit by calling the plot_fit_results method of each internal callback in each objective. Any user-defined callbacks should be called manually instead of using this method.
Returns¶
- dict
Dictionary containing figure and axes objects for each objective.
- plot_sampler_results(confidence_level=None, chi2_minimum=None, burnin_iterations=None, show_bounds=None, bins=None)¶
Produces a pairwise plot of MCMC samples with histograms on the diagonal and scatter plots below.
Parameters¶
- confidence_levelfloat, optional
The confidence level for filtering samples, between 0 and 1. Default is 0.95.
- chi2_minimumfloat, optional
The minimum chi-square value to use as reference. If None, uses the minimum value from the sampler.
- burnin_iterationsint, optional
Number of initial iterations to discard as burn-in. If None, uses the value set in the sampler.
- show_boundsbool, optional
Whether to show parameter bounds on the plots. If None, defaults to True.
- binsint, optional
Number of bins to use for histograms. If None, defaults to 20.
Returns¶
- tuple
A tuple (fig, axes) containing the matplotlib Figure and array of Axes objects for the pairwise plots. The diagonal shows histograms of each parameter’s marginal distribution, while off-diagonal plots show 2D scatter plots of parameter pairs.
Raises¶
- ValueError
If the fit hasn’t been run yet or if this method is called on an optimizer rather than a sampler.
- plot_trace()¶
Plot the cost and each parameter as a function of iteration number.
Returns¶
- tuple
Tuple containing figure and axes objects.
- process_initial_guess_distributions()¶
Process the initial guess distributions for the parameters being fit.
- run(parameter_values)¶
Run the optimization to fit the model to data.
Parameters¶
- parameter_valuesdict
Dictionary of parameter values to use for the fit.
Returns¶
iwp.Result
Results object containing fitted parameters and optimization results.
- sampler_confidence_intervals(confidence_level=None, chi2_minimum=None)¶
Calculate confidence intervals for sampled parameters based on chi-square thresholds.
Parameters¶
- confidence_levelfloat, optional
The confidence level for the intervals, between 0 and 1. Default is 0.95.
- chi2_minimumfloat, optional
The minimum chi-square value to use as reference. Default is the minimum value in results.
Returns¶
- dict
Dictionary mapping parameter names to tuples of (lower_bound, upper_bound) confidence intervals.
Raises¶
- ValueError
If the fit hasn’t been run, if using an optimizer instead of sampler, or if not using a chi-square cost function.
- set_initial_guesses(initial_guesses: list[dict], initial_guess_sampler: iwp.stats.DistributionSampler | None = None)¶
Set up and validate initial parameter guesses for optimization.
This method: 1. Stores the provided initial guesses or sampler configuration 2. Validates that the initial guesses match the parameters being fit 3. Sets up sampling distributions if no explicit guesses are provided 4. Ensures uniqueness of guesses when using multistart optimization
Parameters¶
- initial_guessesdict or list[dict] or None
User-provided initial guesses. If None, guesses will be sampled using the initial_guess_sampler. Each guess should be a dictionary mapping parameter names to their initial values.
- initial_guess_samplerDistributionSampler or None
Sampler to generate initial guesses when not provided directly. If None and no initial_guesses provided, defaults to LatinHypercube.
Raises¶
- ValueError
If initial guesses are invalid, non-unique for multistart optimization, or don’t match the parameters being fit.
- timeseries_preprocessing()¶
Set up time stepping for the fit.
- objectives
- class ionworkspipeline.data_fits.ArrayDataFit(objectives, **kwargs)¶
A pipeline element that fits a model to data for multiple independent variable values. The data for each independent variable value is fitted separately. The independent variable values should be given as the keys of the objectives dictionary. The value of each key should be a
ionworkspipeline.objectives.Objective
object. This objective will be used to fit the data for the corresponding independent variable value.The user-supplied objectives should assign the independent variable value to the custom_parameters attribute of the objective as appropriate. This class simply calls a separate
ionworkspipeline.DataFit
for each provided objective. It does not pass the independent variable value to the objective, so the user must ensure that the objective is set up to use the independent variable value correctly.For example, this can be used to fit a model to data at multiple temperatures, or fit each pulse of a GITT experiment separately, with post-processing to extract functional relationships between parameters and the independent variable.
The rest of the parameters are the same as for
ionworkspipeline.DataFit
.Extends:
ionworkspipeline.data_fits.data_fit.DataFit
- check_initial_guesses()¶
Check that initial guesses can be sampled for all data fits.
- class ionworkspipeline.data_fits.CostLogger(plot_every=None, print_every=None, checkpoint_func=None)¶
A class to log the cost and parameters during a fit, and plot the results.
Parameters¶
- plot_everyfloat, optional
The number of seconds between each plot of the cost and parameters. If None, no plots are generated during the fit, but the final plot can be generated using the
plot()
method.- print_everyfloat, optional
The number of seconds between each print of the cost and parameters. If None, no prints are generated during the fit, but the final print can be generated using the
print()
method.- checkpoint_funccallable, optional
A function to call to checkpoint the fit. The function should take a single argument, which will be a dictionary containing the current state of the fit.
- argmin_costs()¶
Get the index of the job with minimum cost. If all costs are NaN, return 0.
Returns¶
- int
Index of minimum cost
- argsort_costs()¶
Sort the jobs in ascending order of cost.
Returns¶
- numpy.ndarray
Indices that would sort the costs
- property children¶
List of child CostLoggerJob objects.
- clear_axes()¶
Clear the plot axes.
- property cost¶
List of logged costs.
- property fig_axes¶
Get or create figure and axes objects.
Returns¶
- tuple
Figure and axes objects, and whether they existed
- finish()¶
Finish logging.
- property finished¶
Whether logging is finished.
- property is_parent¶
Whether this is a parent logger.
- property iteration¶
List of iteration numbers.
- log(logs=None, job_id=None)¶
Log cost and parameter values.
Parameters¶
- logsdict, optional
Dictionary of values to log
- job_idint, optional
ID of the job being logged
- property multiprocessing¶
Whether multiprocessing is enabled.
- property num_jobs¶
Number of jobs being logged.
- property parent¶
Parent CostLogger object.
- property plot_every¶
Seconds between plot updates.
- property plot_flag¶
Plot update flag.
- plot_refresh(force_plot=False)¶
Update the plot of cost and parameters.
Parameters¶
- force_plotbool, optional
If True, force a plot update regardless of timing
Returns¶
- tuple
Figure and axes objects
- property plot_variables¶
List of variables to plot.
- property print_every¶
Seconds between print updates.
- property probabilistic¶
Whether probabilistic sampling is enabled.
- reset()¶
Reset all logging data and counters.
- set_datafit_attributes(datafit)¶
Set the DataFit attributes.
- set_multiprocessing(multiprocessing)¶
Set whether multiprocessing is being used.
Parameters¶
- multiprocessingbool
Whether multiprocessing is enabled
- set_parameters(parameters)¶
Set the parameters to be logged.
Parameters¶
- parameterslist
List of Parameter objects
- set_probabilistic(probabilistic)¶
Set whether probabilistic sampling is enabled.
- property show_plot_iterative¶
Whether to show plots during optimization.
- property show_print_iterative¶
Whether to print updates during optimization.
- spawn_children(num_jobs)¶
Create child CostLoggerJob objects.
Parameters¶
- num_jobsint
Number of child loggers to create
Returns¶
- list
List of CostLoggerJob objects
- start()¶
Start logging.
- property timer¶
Timer object.