Loading and saving¶
This notebook shows how the load data for use in the pipeline and how to save the results of a fit. The pipeline comes with the iwutil
package, which contains utilities for loading and saving data.
import iwutil
import numpy as np
import pandas as pd
import shutil
Generate data¶
Let’s save some dummy data to a CSV and JSON files using the iwutil.save
module. This module contains functions for saving data to a variety of formats. If the filename contains a “/” then the data is saved that subdirectory. If the subdirectory does not exist it will be created.
# Generate some dummy data
n_samples = 100
x = np.linspace(0, 10, n_samples)
y = 2 * x + np.random.normal(0, 1, n_samples)
z = x**2 + np.random.normal(0, 2, n_samples)
# Create a DataFrame and save to CSV in the tmp directory
df = pd.DataFrame({"x": x, "y": y, "z": z})
iwutil.save.csv(df, "tmp/data.csv")
# Create a dictionary and save to JSON in the tmp directory
metadata = {"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]}
iwutil.save.json(metadata, "tmp/metadata.json")
Load the data¶
We can load the data into a pandas DataFrame using the iwutil.read_df
function. This function will automatically detect the format of the file based on the filename extension (e.g. .csv
, .json
, .parquet
).
df = iwutil.read_df("tmp/data.csv")
print(df.head())
x y z
0 0.00000 -0.293762 -0.315006
1 0.10101 1.422873 0.149952
2 0.20202 -0.418231 -1.033983
3 0.30303 0.578576 -0.617789
4 0.40404 1.444162 -0.459662
metadata = iwutil.read_df("tmp/metadata.json")
print(metadata)
A B C
0 1 4 7
1 2 5 8
2 3 6 9
Finally, we can delete the temporary directory and all of its contents
shutil.rmtree("tmp")