Preprocessing (sphero_vem.preprocessing)#

Zarr-native resampling, tensor downscaling, and crop utilities.

Functions for preprocessing images.

sphero_vem.preprocessing.create_pyramid(image, num_levels, factor)[source]#

Build a multi-resolution image pyramid.

Parameters:
  • image (torch.Tensor) – Input image tensor at full resolution.

  • num_levels (int) – Total number of pyramid levels, including the full-resolution image.

  • factor (int) – Downsampling factor between consecutive levels.

Returns:

List of image tensors ordered from coarsest to finest resolution.

Return type:

list[torch.Tensor]

sphero_vem.preprocessing.downscale_tensor(image, factor, mode='bilinear')[source]#

Downscale a tensor or batch of tensors by an integer factor.

Parameters:
  • image (torch.Tensor) – Input tensor of shape (..., H, W). Unsqueezed to 4-D internally if necessary before interpolation.

  • factor (int) – Integer downsampling factor. Output spatial dimensions are H // factor × W // factor.

  • mode (str, optional) – Interpolation mode passed to torch.nn.functional.interpolate. Default is "bilinear". Use "nearest" for label maps.

Returns:

Downscaled tensor with the same number of dimensions as the input.

Return type:

torch.Tensor

sphero_vem.preprocessing.resample_array(zarr_path, array_path, target_spacing, order=1, zarr_chunks=(1, 1024, 1024), n_workers=4)[source]#

Resample an array in a Zarr archive to the target voxel spacing.

Uses a lazy Gaussian pre-blur followed by affine transform via dask_image, keeping memory usage bounded to chunk size throughout. Anti-aliasing is applied only along downsampled axes, mirroring skimage.transform.resize. Integer label data (integer dtype + order=0) is resampled without anti-aliasing. float16 arrays are promoted to float32 for processing and cast back on output, as scipy.ndimage does not support float16.

Parameters:
  • zarr_path (Path) – Path to the Zarr archive.

  • array_path (str) – Path to the source array within the archive.

  • target_spacing (tuple[int, int, int]) – Target voxel spacing (Z, Y, X) in nanometers.

  • order (int) – Spline interpolation order. 0 = nearest neighbour (labels), 1 = linear (images). Default 1.

  • zarr_chunks (tuple[int, int, int]) – Chunk shape for the output Zarr array.

  • n_workers (int) – Number of threads for dask’s threaded scheduler. Default 4.

Return type:

None

sphero_vem.preprocessing.rechunk_array(root, src_array_path, dst_array_path, dst_chunks=(1, 1024, 1024), copy_attributes=True, delete_src=False, verbose=True)[source]#

Copy a Zarr array to a new path with a different chunk layout.

Parameters:
  • root (zarr.Group) – Root Zarr group containing the source array.

  • src_array_path (str) – Path to the source array within root.

  • dst_array_path (str) – Path for the destination array within root. Created or overwritten.

  • dst_chunks (tuple[int, int, int], optional) – Chunk shape for the output array. Default is (1, 1024, 1024).

  • copy_attributes (bool, optional) – If True, copy all Zarr attributes from source to destination. Default is True.

  • delete_src (bool, optional) – If True, delete the source array after copying. Default is False.

  • verbose (bool, optional) – If True, show a tqdm progress bar. Default is True.

Returns:

The newly created destination array.

Return type:

zarr.Array

Raises:

FileNotFoundError – If src_array_path does not exist within root.

sphero_vem.preprocessing.crop_to_valid(data, mode='nonzero')[source]#

Crop a 3D array to the bounding box of valid data.

Parameters:
  • data (np.ndarray) – The 3D input array.

  • mode (Literal["nonzero", "notnan"], optional) – The validity criteria: “nonzero” (default) or “notnan”.

Returns:

The cropped array.

Return type:

np.ndarray

Raises:

ValueError – If mode is not a valid value.