# Design Principles
The way this package is designed is a bit counter-intuitive at first sight. This document explains and attempts to justify some of the design decisions. These are not set in stone so any suggestions for changes are very welcome.

## Use cases

The main goal of this package is to provide reuseable building blocks for building a parameterization pipeline. Since there are many different ways to parameterize a battery model, designing the blocks in a way that they are robust to fitting parameters in different orders is non-trivial.

For example, in some cases, electrode capacity might be calculated from theoretical knowledge of the material, and then used to calculate stoichiometry limits. In other cases, we might want to estimate the electrode capacity and stoichiometry limits simultaneously from dV/dQ data.

## Design

The basic building block is the `PipelineElement`. Any `PipelineElement` should take in a set of parameter values (possibly empty) and return another set of parameter values. Then the full pipeline is built by calling the pipeline element in series, passing the output of each pipeline element to the next one, until we have the full set of parameters.

More specifically, there a three types of element
| Type | Description |
| --- | --- |
| `DirectEntry` | The simplest type of element, ignores the inputted parameters and simply returns a pre-defined set of parameters (e.g. from literature, or direct measurements) |
| `Calculation` | Calculates new parameters based on the ones that have been passed in (e.g. calculating maximum particle concentration from areal capacity, active material volume fraction, and thickness) |
| `DataFit` | Calculates new parameters by fitting a model to some data |

The first two are fairly straightforward and intuitive, but the `DataFit` is a bit more complicated

### DataFit

The `DataFit` class is a standard wrapper class that requires the following customizable inputs
- an objective (see `Objective` below) which includes the model and data
- what parameters to fit, their initial guesses, and their bounds
- other parameters required to run the model

To separate concerns, we split the work done by the `DataFit` into a few different classes:

- An `Objective` class, which takes in the data (and metadata), along with any options and callbacks. This class implements a method `build`, which takes in some parameter values and defines a `run` function that can be called with a dictionary of inputs and a dictionary of results whose keys match the data being used in the fit.
- A `Cost` class, which describes how to compare the model and data. This class implements a `__call__` method which creates a cost (e.g. least squares) given a dictionary of data and model results. 
- An `Optimizer` class, which defines the optimization routine to be used. 
- The `DataFit` class takes in the `Objective`, a dictionary of parameters to be fit together with their form, initial guesses and bounds, a `Cost`, an `Optimizer` and some options. The "form" of the parameters can be:
    - an `iwp.Parameter` object, e.g. `iwp.Parameter("x")`
    - a pybamm expression, in which case the other parameters should also be explicitly provided as `iwp.Parameter` objects, e.g.
    ```python
    {
    "param": 2 * pybamm.Parameter("half-param"),
    "half-param": iwp.Parameter("half-param")
    }
    ```
    works, but
    ```python
    {"param": 2 * iwp.Parameter("half-param")}
    ```
    would not
    - a function, containing other parameters, in which case the other parameters should again also be explicitly provided as `iwp.Parameter` objects, e.g.
    ```python
    {
        "main parameter": lambda x: pybamm.Parameter("other parameter") * x**2,
        "other parameter": iwp.Parameter("other parameter")
    }
    ```
The name of the input parameter does not need to match the name of the parameter. In all cases, the `DataFit` class will automatically process the dictionary of input values into a vector of values to be passed to the optimizer.  

Separating the `Objective`, `Cost` and `Optimizer` from the `DataFit` has several benefits:
- the same `Objective` can be used with different costs (e.g. RMSE of the whole data vs extracting features)
- objectives can be reused within a pipeline, for example first to get an approximate value for the parameters, and then again later to get a more accurate value
- combined objectives can be created by combining different objectives to simultaneously optimize over different data sets (for example constant-current discharge at different C-rates or temperatures)

In summary, the `Objective` and `Cost` classes determines how the model and data are compared for generic parameter values and inputs, and the `DataFit` class specifies what parameters are the inputs and performs the fit. The `Optimizer` class is used to perform the optimization.