Utils (sphero_vem.utils)#

Configuration base classes, zarr helpers, metadata utilities, device detection, and GPU dispatch.

Config#

Functions and utilities for config classes

sphero_vem.utils.config.to_serializable(input_dict)[source]#

Convert all dictionary values to JSON-serializable types.

Handles non-standard types such as numpy scalars, numpy arrays, and Path objects by round-tripping through json.dumps/json.loads with CustomJSONEncoder.

Parameters:

input_dict (dict) – Dictionary whose values may contain non-serializable types.

Returns:

A new dictionary where all values are JSON-native types (str, int, float, list, dict, bool, or None).

Return type:

dict

class sphero_vem.utils.config.BaseConfig[source]#

Bases: object

Base class for pipeline configuration dataclasses.

Provides JSON serialization, automatic type coercion for supported fields (see Notes), and a two-tier parameter view (full config vs. scientifically relevant metadata). Subclasses should be @dataclass and may override the two class variables below to control which fields are exposed in each tier.

Class Variables#

EXCLUDED_JSON_FIELDSClassVar[set[str]]

Field names omitted from to_json / full_config. Use this for fields that cannot be JSON-serialized at all (e.g. live zarr.Array handles or torch.device objects).

EXCLUDED_PROCESSING_FIELDSClassVar[set[str]]

Field names omitted from processing_metadata in addition to those in EXCLUDED_JSON_FIELDS. Use this for fields that are serializable but irrelevant to scientific reproducibility — such as file paths, verbosity flags, worker counts, or derived runtime values.

Notes

Deserialization uses dacite with DACITE_CONFIG, which applies Path, tuple,``float``, and int type coercions so that configs survive a JSON round-trip without losing type information.

to_json(filepath)[source]#

Serialize the config to a JSON file.

Parameters:

filepath (str | Path) – Destination file path. Created or overwritten.

Return type:

None

classmethod from_json(filepath)[source]#

Load a config instance from a JSON file.

Parameters:

filepath (str | Path) – Path to a JSON file previously written by to_json.

Returns:

A new instance of the calling class with fields populated from the JSON file, with type coercion applied via dacite.

Return type:

Self

classmethod from_dict(config_dict)[source]#

Instantiate a config from a plain dictionary.

Type coercion (e.g. listtuple, strPath) is applied via dacite using DACITE_CONFIG.

Parameters:

config_dict (dict[str, Any]) – Dictionary mapping field names to values.

Returns:

A new instance of the calling class.

Return type:

Self

full_config()[source]#

Return a fully serializable representation of the config.

Excludes fields listed in EXCLUDED_JSON_FIELDS.

Returns:

JSON-serializable dictionary of all config fields except those excluded by EXCLUDED_JSON_FIELDS.

Return type:

dict

processing_metadata()[source]#

Return the subset of config parameters relevant for scientific reproducibility.

Excludes fields listed in both EXCLUDED_JSON_FIELDS and EXCLUDED_PROCESSING_FIELDS.

Returns:

JSON-serializable dictionary of scientifically relevant parameters.

Return type:

dict

class sphero_vem.utils.config.ProcessingStep(step_name, timestamp, parameters, version=None)[source]#

Bases: object

Represents a single processing step in the pipeline.

Can be created from a config or manually for manual steps.

Parameters:
  • step_name (str)

  • timestamp (str)

  • parameters (dict)

  • version (str | None)

classmethod from_config(step_name, config, version=None)[source]#

Create a processing step from a config object.

Parameters:
  • step_name (str) – Name of the processing step

  • config (BaseConfig) – Configuration dataclass instance, it should be a subclass of BaseConfig that inherits its BaseConfig.processing_metadata() method.

  • version (str | None) – Optional software version string. Default is None.

Return type:

Self

classmethod manual(step_name, parameters, version=None)[source]#

Create a manual processing step (no config).

Parameters:
  • step_name (str) – Name of the processing step

  • parameters (dict) – Dictionary of parameters for this step. Take care that non-serializable objects are not passed as a parameter value.

  • version (str | None) – Optional software version string. Default is None.

Return type:

Self

classmethod from_dict(data)[source]#

Load a processing step from a dictionary (e.g., from zarr attrs).

Return type:

Self

Parameters:

data (dict)

to_dict()[source]#

Convert to a serializable dictionary for storage (e.g., in zarr attrs).

Return type:

dict

Misc#

Utility functions

sphero_vem.utils.misc.read_manifest(data_dir)[source]#

Read manifest in directory

Return type:

dict

Parameters:

data_dir (Path)

sphero_vem.utils.misc.vprint(text, verbose)[source]#

Helper function for cleanly handling print statements with a verbose option

Return type:

None

Parameters:
sphero_vem.utils.misc.timestamp()[source]#

Returns a timestamp for the current time up to seconds, ISO-formatted and widely filesystem compatible

Return type:

str

sphero_vem.utils.misc.detect_torch_device()[source]#
Return type:

device

class sphero_vem.utils.misc.CustomJSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]#

Bases: JSONEncoder

A custom JSONEncoder to handle non base data types

default(obj)[source]#

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return super().default(o)
sphero_vem.utils.misc.create_ome_multiscales(group)[source]#

Create multiscales specifications compliant with OME-NGFF format v0.5.

Automatically infers multichannel and spatial dimensions from existing arrays.

Parameters:

group (zarr.Group | Path) – Zarr group that contains the multiscale arrays, or path to it.

Return type:

None

Notes

  • Spatial dimensions inferred from ‘spacing’ attribute length

  • Channel dimension assumed if array.ndim > len(spacing)

  • Axis order is always C(Z)YX

  • Does nothing if no scale arrays found

sphero_vem.utils.misc.dirname_from_spacing(spacing)[source]#

Convenience function to create a directory name from spacing in the format ‘{spacing_z}-{spacing_y}-{spacing_x}’

Return type:

str

Parameters:

spacing (tuple[int, int, int])

sphero_vem.utils.misc.get_multiscales(group)[source]#

Get array scales as a list of dicts.

The function looks for “spacing” in the array attributes as a source of ground truth. If not found, the array is ignored.

Parameters:

group (zarr.Group) – Zarr group containing the multiscale arrays.

Returns:

A list containing the multiscale information as a dictionary. Scales are sorted for ascending pixel area/voxel volume. Example:

[
    {"path": "0", "scale": [50, 50, 50]},
    {"path": "1", "scale": [100, 100, 100]}
]

Return type:

list[dict]

sphero_vem.utils.misc.temporary_zarr(shape, chunks, dtype=<class 'numpy.float32'>, prefix='intermediate_', dir=None)[source]#

Context manager for temporary zarr array.

Parameters:
  • shape (tuple[int, ...]) – Shape of the array.

  • chunks (tuple[int, ...]) – Chunk size for the array.

  • dtype (np.dtype) – Data type of the array. Default is np.float32.

  • prefix (str) – Prefix for the temporary directory name. Default is "intermediate_".

  • dir (Path | str | None) – Parent directory for the temporary zarr. If None, uses system temp.

Yields:

zarr.Array – Temporary zarr array, deleted on context exit.

sphero_vem.utils.misc.bbox_expand(bbox, margin, im_shape)[source]#

Expand bounding box by margin without indexing out of image bounds.

Parameters:
  • bbox (tuple[int]) – Bounding box coordinates in the form (x0_min, x1_min, …, x0_max, x1_max, …). The order of the coordinates x_i should be the same as numpy axis.

  • margin (int) – Constant margin for bounding box expansion. The bounding box will be expanded by this value in all directions.

  • im_shape (tuple[int]) – Shape of the image array in the same axis order as bbox. Used to clip the expanded bounding box so it does not exceed array bounds.

Returns:

bbox_exp – Expanded bounding box, in the form (x0_min, x1_min, …, x0_max, x1_max, …).

Return type:

tuple[int]

sphero_vem.utils.misc.slice_from_bbox(bbox)[source]#

Get slice from a bounding box for easy image cropping.

Parameters:

bbox (tuple[int]) – Bounding box coordinates in the form (x0_min, x1_min, …, x0_max, x1_max, …). The order of the coordinates x_i should be the same as numpy axis.

Returns:

Tuple of slices for indexing.

Return type:

tuple[slice]

sphero_vem.utils.misc.check_isotropic(spacing, raise_error=False)[source]#

Check if spacing is isotropic, and optionally raise an error if it’s not.

Parameters:
  • spacing (Sequence[float]) – A sequence containing the voxel spacing to check.

  • raise_error (bool) – Flag that controls whether to raise an error is the check fails. Default is False.

Returns:

True is the spacing is isotropic.

Return type:

bool

Raises:

ValueError – If the spacing is not isotropic and raise_error is True.

sphero_vem.utils.misc.weighted_std(values, weights)[source]#

Calculate the weighted standard deviation of the data.

Parameters:
  • values (np.ndarray) – Array containing the data.

  • weights (np.ndarray) – Array containing the weights. It must have the same shape as values.

Returns:

The weighted standardn deviation.

Return type:

float

sphero_vem.utils.misc.flatten_for_save(df, sep='__')[source]#

Unpack tuple/list columns into indexed scalar columns for storage.

Tuple columns are expanded into separate columns with names {original_name}{sep}0, {original_name}{sep}1, etc. The original tuple column is dropped.

Parameters:
  • df (pd.DataFrame) – DataFrame with possible tuple or list valued columns.

  • sep (str, optional) – Separator between column name and index. Must be passed identically to reconstruct_tuples for round-tripping. Default is "__".

Returns:

DataFrame with all tuple columns replaced by scalar columns.

Return type:

pd.DataFrame

Raises:

ValueError – If any column name already contains sep, which would create ambiguity on reconstruction.

See also

reconstruct_tuples

Inverse operation.

sphero_vem.utils.misc.reconstruct_tuples(df, sep='__')[source]#

Pack indexed scalar columns back into tuple columns.

Columns matching the pattern {name}{sep}0, {name}{sep}1, … are merged into a single tuple column {name}. The indexed columns are dropped.

Parameters:
  • df (pd.DataFrame) – DataFrame as loaded from parquet, with flattened tuple columns.

  • sep (str, optional) – Separator used by flatten_for_save. Default is "__".

Returns:

DataFrame with indexed columns replaced by tuple columns.

Return type:

pd.DataFrame

Raises:

ValueError – If indexed columns for a group are not contiguous starting from 0 (e.g., bbox__0, bbox__2 without bbox__1).

See also

flatten_for_save

Inverse operation.

sphero_vem.utils.misc.repair_multiscales(root, start_path='')[source]#

Recursively repair multiscales metadata for all groups in hierarchy.

Parameters:
  • root (Path) – Path to the Zarr store containing the hierarchy

  • start_path (str, default="") – Path to start repair from (empty string for root).

Return type:

None

Accelerator#

numpy/cupy switch with a single xp and a safe ArrayLike type. - If cupy is importable, xp is cupy; otherwise it’s numpy. - ArrayLike works with Pylance/mypy without requiring cupy at runtime.

sphero_vem.utils.accelerator.to_host(arr)[source]#

Move an array to host (CPU) memory.

Parameters:

arr (ArrayLike) – Input array on any device.

Returns:

Array on host memory. If arr is already a NumPy array it is returned via np.asarray without copying. If CuPy is available and arr is a CuPy array, it is copied to host memory.

Return type:

numpy.ndarray

sphero_vem.utils.accelerator.to_device(arr)[source]#

Move an array to the active compute device.

Parameters:

arr (ArrayLike) – Input array, either a NumPy or CuPy ndarray.

Returns:

CuPy array if GPU is available, NumPy array otherwise. If arr is already on the target device it is returned unchanged.

Return type:

ArrayLike

sphero_vem.utils.accelerator.gpu_dispatch(*, return_to_host=False, host_kwarg='_to_host')[source]#

Decorator that dispatches function inputs to GPU, if available, before the call.

Kernels should use the global xp and ndi imported by this module instead of numpy or scipy.ndimage to ensure that calculations work as expected.

Parameters:
  • return_to_host (bool, optional) – Default behavior for the wrapped function: - False (default): return arrays on the active device (CuPy if GPU, NumPy otherwise) - True: always convert result back to NumPy if GPU is used

  • host_kwarg (str, optional) – Name of a special keyword argument that can override return_to_host at call time. The kwarg is popped from kwargs and never forwarded to the wrapped function.

sphero_vem.utils.accelerator.da_to_device(x)[source]#

Return a Dask array whose blocks are on the active compute device.

Parameters:

x (dask.array.Array) – Input Dask array with NumPy-backed blocks.

Returns:

Dask array with CuPy-backed blocks if GPU is available, otherwise NumPy-backed blocks.

Return type:

dask.array.Array

sphero_vem.utils.accelerator.da_to_host(x)[source]#

Return a Dask array whose blocks are NumPy ndarrays on host memory.

Parameters:

x (dask.array.Array) – Input Dask array, potentially with CuPy-backed blocks.

Returns:

Dask array with NumPy-backed blocks. If no GPU is available and the array is already NumPy-backed, it is returned unchanged.

Return type:

dask.array.Array