Utils (sphero_vem.utils)#
Configuration base classes, zarr helpers, metadata utilities, device detection, and GPU dispatch.
Config#
Functions and utilities for config classes
- sphero_vem.utils.config.to_serializable(input_dict)[source]#
Convert all dictionary values to JSON-serializable types.
Handles non-standard types such as numpy scalars, numpy arrays, and
Pathobjects by round-tripping throughjson.dumps/json.loadswithCustomJSONEncoder.
- class sphero_vem.utils.config.BaseConfig[source]#
Bases:
objectBase class for pipeline configuration dataclasses.
Provides JSON serialization, automatic type coercion for supported fields (see Notes), and a two-tier parameter view (full config vs. scientifically relevant metadata). Subclasses should be
@dataclassand may override the two class variables below to control which fields are exposed in each tier.Class Variables#
- EXCLUDED_JSON_FIELDSClassVar[set[str]]
Field names omitted from
to_json/full_config. Use this for fields that cannot be JSON-serialized at all (e.g. livezarr.Arrayhandles ortorch.deviceobjects).- EXCLUDED_PROCESSING_FIELDSClassVar[set[str]]
Field names omitted from
processing_metadatain addition to those inEXCLUDED_JSON_FIELDS. Use this for fields that are serializable but irrelevant to scientific reproducibility — such as file paths, verbosity flags, worker counts, or derived runtime values.
Notes
Deserialization uses
dacitewithDACITE_CONFIG, which appliesPath,tuple,``float``, andinttype coercions so that configs survive a JSON round-trip without losing type information.- classmethod from_json(filepath)[source]#
Load a config instance from a JSON file.
- Parameters:
filepath (str | Path) – Path to a JSON file previously written by
to_json.- Returns:
A new instance of the calling class with fields populated from the JSON file, with type coercion applied via dacite.
- Return type:
Self
- classmethod from_dict(config_dict)[source]#
Instantiate a config from a plain dictionary.
Type coercion (e.g.
list→tuple,str→Path) is applied via dacite usingDACITE_CONFIG.
- class sphero_vem.utils.config.ProcessingStep(step_name, timestamp, parameters, version=None)[source]#
Bases:
objectRepresents a single processing step in the pipeline.
Can be created from a config or manually for manual steps.
- classmethod from_config(step_name, config, version=None)[source]#
Create a processing step from a config object.
- Parameters:
step_name (str) – Name of the processing step
config (BaseConfig) – Configuration dataclass instance, it should be a subclass of BaseConfig that inherits its BaseConfig.processing_metadata() method.
version (str | None) – Optional software version string. Default is None.
- Return type:
Self
- classmethod manual(step_name, parameters, version=None)[source]#
Create a manual processing step (no config).
Misc#
Utility functions
- sphero_vem.utils.misc.vprint(text, verbose)[source]#
Helper function for cleanly handling print statements with a verbose option
- sphero_vem.utils.misc.timestamp()[source]#
Returns a timestamp for the current time up to seconds, ISO-formatted and widely filesystem compatible
- Return type:
- class sphero_vem.utils.misc.CustomJSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]#
Bases:
JSONEncoderA custom JSONEncoder to handle non base data types
- default(obj)[source]#
Implement this method in a subclass such that it returns a serializable object for
o, or calls the base implementation (to raise aTypeError).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return super().default(o)
- sphero_vem.utils.misc.create_ome_multiscales(group)[source]#
Create multiscales specifications compliant with OME-NGFF format v0.5.
Automatically infers multichannel and spatial dimensions from existing arrays.
- Parameters:
group (zarr.Group | Path) – Zarr group that contains the multiscale arrays, or path to it.
- Return type:
Notes
Spatial dimensions inferred from ‘spacing’ attribute length
Channel dimension assumed if array.ndim > len(spacing)
Axis order is always C(Z)YX
Does nothing if no scale arrays found
- sphero_vem.utils.misc.dirname_from_spacing(spacing)[source]#
Convenience function to create a directory name from spacing in the format ‘{spacing_z}-{spacing_y}-{spacing_x}’
- sphero_vem.utils.misc.get_multiscales(group)[source]#
Get array scales as a list of dicts.
The function looks for “spacing” in the array attributes as a source of ground truth. If not found, the array is ignored.
- Parameters:
group (zarr.Group) – Zarr group containing the multiscale arrays.
- Returns:
A list containing the multiscale information as a dictionary. Scales are sorted for ascending pixel area/voxel volume. Example:
[ {"path": "0", "scale": [50, 50, 50]}, {"path": "1", "scale": [100, 100, 100]} ]
- Return type:
- sphero_vem.utils.misc.temporary_zarr(shape, chunks, dtype=<class 'numpy.float32'>, prefix='intermediate_', dir=None)[source]#
Context manager for temporary zarr array.
- Parameters:
- Yields:
zarr.Array – Temporary zarr array, deleted on context exit.
- sphero_vem.utils.misc.bbox_expand(bbox, margin, im_shape)[source]#
Expand bounding box by margin without indexing out of image bounds.
- Parameters:
bbox (tuple[int]) – Bounding box coordinates in the form (x0_min, x1_min, …, x0_max, x1_max, …). The order of the coordinates x_i should be the same as numpy axis.
margin (int) – Constant margin for bounding box expansion. The bounding box will be expanded by this value in all directions.
im_shape (tuple[int]) – Shape of the image array in the same axis order as bbox. Used to clip the expanded bounding box so it does not exceed array bounds.
- Returns:
bbox_exp – Expanded bounding box, in the form (x0_min, x1_min, …, x0_max, x1_max, …).
- Return type:
- sphero_vem.utils.misc.slice_from_bbox(bbox)[source]#
Get slice from a bounding box for easy image cropping.
- sphero_vem.utils.misc.check_isotropic(spacing, raise_error=False)[source]#
Check if spacing is isotropic, and optionally raise an error if it’s not.
- Parameters:
- Returns:
True is the spacing is isotropic.
- Return type:
- Raises:
ValueError – If the spacing is not isotropic and raise_error is True.
- sphero_vem.utils.misc.weighted_std(values, weights)[source]#
Calculate the weighted standard deviation of the data.
- Parameters:
values (np.ndarray) – Array containing the data.
weights (np.ndarray) – Array containing the weights. It must have the same shape as values.
- Returns:
The weighted standardn deviation.
- Return type:
- sphero_vem.utils.misc.flatten_for_save(df, sep='__')[source]#
Unpack tuple/list columns into indexed scalar columns for storage.
Tuple columns are expanded into separate columns with names
{original_name}{sep}0,{original_name}{sep}1, etc. The original tuple column is dropped.- Parameters:
df (pd.DataFrame) – DataFrame with possible tuple or list valued columns.
sep (str, optional) – Separator between column name and index. Must be passed identically to reconstruct_tuples for round-tripping. Default is
"__".
- Returns:
DataFrame with all tuple columns replaced by scalar columns.
- Return type:
pd.DataFrame
- Raises:
ValueError – If any column name already contains sep, which would create ambiguity on reconstruction.
See also
reconstruct_tuplesInverse operation.
- sphero_vem.utils.misc.reconstruct_tuples(df, sep='__')[source]#
Pack indexed scalar columns back into tuple columns.
Columns matching the pattern
{name}{sep}0,{name}{sep}1, … are merged into a single tuple column{name}. The indexed columns are dropped.- Parameters:
df (pd.DataFrame) – DataFrame as loaded from parquet, with flattened tuple columns.
sep (str, optional) – Separator used by flatten_for_save. Default is
"__".
- Returns:
DataFrame with indexed columns replaced by tuple columns.
- Return type:
pd.DataFrame
- Raises:
ValueError – If indexed columns for a group are not contiguous starting from 0 (e.g.,
bbox__0,bbox__2withoutbbox__1).
See also
flatten_for_saveInverse operation.
Accelerator#
numpy/cupy switch with a single xp and a safe ArrayLike type. - If cupy is importable, xp is cupy; otherwise it’s numpy. - ArrayLike works with Pylance/mypy without requiring cupy at runtime.
- sphero_vem.utils.accelerator.to_host(arr)[source]#
Move an array to host (CPU) memory.
- Parameters:
arr (ArrayLike) – Input array on any device.
- Returns:
Array on host memory. If arr is already a NumPy array it is returned via
np.asarraywithout copying. If CuPy is available and arr is a CuPy array, it is copied to host memory.- Return type:
- sphero_vem.utils.accelerator.to_device(arr)[source]#
Move an array to the active compute device.
- Parameters:
arr (ArrayLike) – Input array, either a NumPy or CuPy ndarray.
- Returns:
CuPy array if GPU is available, NumPy array otherwise. If arr is already on the target device it is returned unchanged.
- Return type:
ArrayLike
- sphero_vem.utils.accelerator.gpu_dispatch(*, return_to_host=False, host_kwarg='_to_host')[source]#
Decorator that dispatches function inputs to GPU, if available, before the call.
Kernels should use the global xp and ndi imported by this module instead of numpy or scipy.ndimage to ensure that calculations work as expected.
- Parameters:
return_to_host (bool, optional) – Default behavior for the wrapped function: - False (default): return arrays on the active device (CuPy if GPU, NumPy otherwise) - True: always convert result back to NumPy if GPU is used
host_kwarg (str, optional) – Name of a special keyword argument that can override return_to_host at call time. The kwarg is popped from kwargs and never forwarded to the wrapped function.
- sphero_vem.utils.accelerator.da_to_device(x)[source]#
Return a Dask array whose blocks are on the active compute device.
- Parameters:
x (dask.array.Array) – Input Dask array with NumPy-backed blocks.
- Returns:
Dask array with CuPy-backed blocks if GPU is available, otherwise NumPy-backed blocks.
- Return type:
dask.array.Array
- sphero_vem.utils.accelerator.da_to_host(x)[source]#
Return a Dask array whose blocks are NumPy ndarrays on host memory.
- Parameters:
x (dask.array.Array) – Input Dask array, potentially with CuPy-backed blocks.
- Returns:
Dask array with NumPy-backed blocks. If no GPU is available and the array is already NumPy-backed, it is returned unchanged.
- Return type:
dask.array.Array