{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Initial Guesses\n",
    "\n",
    "In this example, we demonstrate how to\n",
    "1. randomly generate initial guesses\n",
    "2. run a pre-screening step to filter the top multistarts\n",
    "3. optimize the best initial guess candidates\n",
    "\n",
    "We begin by generating some synthetic data from a battery model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pybamm\n",
    "import ionworkspipeline as iwp\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Load the model and parameter values\n",
    "model = pybamm.lithium_ion.SPM()\n",
    "parameter_values = iwp.ParameterValues(\"Chen2020\")\n",
    "parameter_values.update(\n",
    "    {\n",
    "        \"Negative electrode active material volume fraction\": 0.75,\n",
    "        \"Positive electrode active material volume fraction\": 0.665,\n",
    "    }\n",
    ")\n",
    "\n",
    "# Generate the data\n",
    "t = np.arange(0, 900, 3)\n",
    "sim = iwp.Simulation(\n",
    "    model, parameter_values=parameter_values, t_eval=[0, 900], t_interp=t\n",
    ")\n",
    "sol = sim.solve()\n",
    "data = pd.DataFrame(\n",
    "    {x: sim.solution[x](t) for x in [\"Time [s]\", \"Current [A]\", \"Voltage [V]\"]}\n",
    ")\n",
    "# add noise to the data\n",
    "sigma = 0.001\n",
    "data[\"Voltage [V]\"] += np.random.normal(0, sigma, len(data))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we define the parameters to fit and their bounds. We apply a `Log10` transform to the parameters, which optimizes and randomly generates the parameters on a logarithmic basis. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "parameters = {\n",
    "    \"Negative electrode active material volume fraction\": iwp.transforms.Log10(\n",
    "        iwp.Parameter(\n",
    "            \"Negative electrode active material volume fraction\",\n",
    "            bounds=(0.55, 1),\n",
    "        )\n",
    "    ),\n",
    "    \"Positive electrode active material volume fraction\": iwp.transforms.Log10(\n",
    "        iwp.Parameter(\n",
    "            \"Positive electrode active material volume fraction\",\n",
    "            bounds=(0.45, 1),\n",
    "        )\n",
    "    ),\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We set up the objective function, which in this case is the current driven objective. This takes the time vs current data and runs the model, comparing the model's predictions for the voltage with the experimental data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "objective = iwp.objectives.CurrentDriven(data, options={\"model\": model})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we set up the `DataFit` object, which takes the objective function, the parameters to fit, and the optimizer as inputs. Here we use the `PointEstimate` optimizer, which evaluates the cost function for a single set of parameters. We sample `300` randomly chosen parameters using the `multistarts` keyword argument. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "optimizer = iwp.optimizers.PointEstimate()\n",
    "data_fit = iwp.DataFit(\n",
    "    objective,\n",
    "    parameters=parameters,\n",
    "    optimizer=optimizer,\n",
    "    multistarts=200,\n",
    "    options={\"seed\": 0},\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before running the `DataFit`, we can visualize all the initial guess candidates using a scatterplot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "names = list(parameters.keys())\n",
    "\n",
    "# Collect the initial guesses for each parameter\n",
    "initial_guesses = data_fit.initial_guesses\n",
    "initial_guess_arrays = {k: [ig[k] for ig in initial_guesses] for k in names}\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(8, 6))\n",
    "ax.scatter(\n",
    "    initial_guess_arrays[names[0]],\n",
    "    initial_guess_arrays[names[1]],\n",
    "    label=\"Initial guess\",\n",
    ")\n",
    "\n",
    "# Add marker for the true parameter values\n",
    "ax.scatter(\n",
    "    parameter_values[names[0]],\n",
    "    parameter_values[names[1]],\n",
    "    c=\"green\",\n",
    "    s=120,\n",
    "    marker=\"X\",\n",
    "    edgecolors=\"black\",\n",
    "    linewidth=2,\n",
    "    label=\"True value\",\n",
    ")\n",
    "\n",
    "ax.legend(loc=\"upper right\")\n",
    "ax.set_xlabel(names[0])\n",
    "ax.set_ylabel(names[1])\n",
    "ax.set_title(\"Initial guesses\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we run the data fit, passing the parameters that are not being fit as a dictionary. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "params_for_pipeline = {k: v for k, v in parameter_values.items() if k not in parameters}\n",
    "results = data_fit.run(params_for_pipeline)\n",
    "for k, v in results.items():\n",
    "    print(f\"{k}: {parameter_values[k]:.3e} (true) {v:.3e} (fit)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we plot the results and the trace of the cost function and parameter values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "out = data_fit.plot_fit_results()\n",
    "\n",
    "out[\"CurrentDriven\"][0][0].suptitle(f\"Initial guesses, min cost: {results.costs:.5e}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the above plot, the initial guess and the fit lines fall on top of one another since no optimization is performed. The best-performing initial guess does a decent job of fitting the data, but there is still room for improvement.\n",
    "\n",
    "Let's revisit the initial guess scatterplot where markers are colored by their cost function (lighter colors are better). The top 10 initial guesses are highlighted with a star."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Number of best results to highlight\n",
    "keep_best = 10\n",
    "\n",
    "# Get the best results based on lowest cost\n",
    "best_results = results.best_results(keep_best)\n",
    "\n",
    "# Get parameter names\n",
    "names = list(parameters.keys())\n",
    "\n",
    "# Extract all initial guesses and their costs\n",
    "initial_guesses = results.children\n",
    "initial_guess_arrays = {k: [ig[k] for ig in initial_guesses] for k in names}\n",
    "costs = [r.costs.min() for r in results.children]\n",
    "\n",
    "# Convert costs to numpy array for easier manipulation\n",
    "costs_array = np.array(costs)\n",
    "\n",
    "# Sort results by cost (ascending order)\n",
    "sort_indices = np.argsort(costs_array)\n",
    "x_sorted = np.array(initial_guess_arrays[names[0]])[sort_indices]\n",
    "y_sorted = np.array(initial_guess_arrays[names[1]])[sort_indices]\n",
    "costs_sorted = costs_array[sort_indices]\n",
    "\n",
    "# Create figure showing all results colored by cost\n",
    "fig, ax = plt.subplots(figsize=(8, 6))\n",
    "\n",
    "# Plot all points with color indicating cost value\n",
    "scatter = ax.scatter(\n",
    "    x_sorted,\n",
    "    y_sorted,\n",
    "    c=costs_sorted,\n",
    "    cmap=\"viridis_r\",  # Reversed colormap so lower costs are darker\n",
    "    alpha=0.7,\n",
    "    label=\"Initial guess\",\n",
    ")\n",
    "\n",
    "# Highlight the best results with star markers\n",
    "ax.scatter(\n",
    "    x_sorted[:keep_best],\n",
    "    y_sorted[:keep_best],\n",
    "    c=\"red\",\n",
    "    s=100,\n",
    "    marker=\"*\",\n",
    "    edgecolors=\"black\",\n",
    "    label=f\"Best {keep_best} results\",\n",
    ")\n",
    "\n",
    "# Add marker for the true parameter values\n",
    "ax.scatter(\n",
    "    parameter_values[names[0]],\n",
    "    parameter_values[names[1]],\n",
    "    c=\"green\",\n",
    "    s=120,\n",
    "    marker=\"X\",\n",
    "    edgecolors=\"black\",\n",
    "    linewidth=2,\n",
    "    label=\"True value\",\n",
    ")\n",
    "\n",
    "# Add colorbar and labels\n",
    "plt.colorbar(scatter, label=\"Cost value (lower is better)\")\n",
    "ax.set_xlabel(names[0])\n",
    "ax.set_ylabel(names[1])\n",
    "ax.legend(loc=\"upper right\")\n",
    "ax.set_title(\"Initial sampling of the parameters\")\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After the pre-screening step, we create another `DataFit` which uses the top 10 best-performing initial guesses in a full optimization run. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "optimizer = iwp.optimizers.ScipyMinimize(method=\"Nelder-Mead\")\n",
    "\n",
    "data_fit_refined = iwp.DataFit(\n",
    "    objective,\n",
    "    parameters=parameters,\n",
    "    initial_guesses=best_results,\n",
    ")\n",
    "\n",
    "results_refined = data_fit_refined.run(params_for_pipeline)\n",
    "for k, v in results_refined.items():\n",
    "    print(f\"{k}: {parameter_values[k]:.3e} (true) {v:.3e} (fit)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we plot the refined results and the trace of the cost function and parameter values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "out_refined = data_fit_refined.plot_fit_results()\n",
    "\n",
    "out_refined[\"CurrentDriven\"][0][0].suptitle(\n",
    "    f\"Refined guesses, min cost: {results_refined.costs.min():.5e}\"\n",
    ")\n",
    "\n",
    "_ = data_fit_refined.plot_trace()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}