Utilities API

Documentation for various utilities from all modules.

Logging Utils

class pathml.PathMLLogger

Convenience methods for turning on or off and configuring logging for PathML. Note that this can also be achieved by interfacing with loguru directly

Example:

from pathml import PathMLLogger as pml

# turn on logging for PathML
pml.enable()

# turn off logging for PathML
pml.disable()

# turn on logging and output logs to a file named 'logs.txt', with colorization enabled
pml.enable(sink="logs.txt", colorize=True)
static disable()

Turn off logging for PathML

static enable(sink=sys.stderr, level='DEBUG', fmt='PathML:{level}:{time:HH:mm:ss} | {module}:{function}:{line} | {message}', **kwargs)

Turn on and configure logging for PathML

Parameters
  • sink (str or io._io.TextIOWrapper, optional) – Destination sink for log messages. Defaults to sys.stderr.

  • level (str) – level of logs to capture. Defaults to ‘DEBUG’.

  • fmt (str) – Formatting for the log message. Defaults to: ‘PathML:{level}:{time:HH:mm:ss} | {module}:{function}:{line} | {message}’

  • **kwargs (dict, optional) – additional options passed to configure logger. See: loguru documentation

Core Utils

pathml.core.utils.readtupleh5(h5, key)

Read tuple from h5.

Parameters
  • h5 (h5py.Dataset or h5py.Group) – h5 object that will be read from

  • key (str) – key where data to read is stored

pathml.core.utils.writedataframeh5(h5, name, df)

Write dataframe as h5 dataset.

Parameters
  • h5 (h5py.Dataset) – root of h5 object that df will be written into

  • name (str) – name of dataset to be created

  • df (pd.DataFrame) – dataframe to be written

pathml.core.utils.writedicth5(h5, name, dic)

Write dict as attributes of h5py.Group.

Parameters
  • h5 (h5py.Dataset) – root of h5 object that dic will be written into

  • name (str) – name of dataset to be created

  • dic (str) – dict to be written

pathml.core.utils.writestringh5(h5, name, st)

Write string as h5 attribute.

Parameters
  • h5 (h5py.Dataset) – root of h5 object that st will be written into

  • name (str) – name of dataset to be created

  • st (str) – string to be written

pathml.core.utils.writetupleh5(h5, name, tup)

Write tuple as h5 attribute.

Parameters
  • h5 (h5py.Dataset) – root of h5 object that tup will be written into

  • name (str) – name of dataset to be created

  • tup (str) – tuple to be written

pathml.core.utils.readcounts(h5)

Read counts using anndata h5py.

Parameters

h5 (h5py.Dataset) – h5 object that will be read

pathml.core.utils.writecounts(h5, counts)

Write counts using anndata h5py.

Parameters
  • h5 (h5py.Dataset) – root of h5 object that counts will be written into

  • name (str) – name of dataset to be created

  • tup (anndata.AnnData) – anndata object to be written

Datasets Utils

pathml.datasets.utils.pannuke_multiclass_mask_to_nucleus_mask(multiclass_mask)

Convert multiclass mask from PanNuke to a single channel nucleus mask. Assumes each pixel is assigned to one and only one class. Sums across channels, except the last mask channel which indicates background pixels in PanNuke. Operates on a single mask.

Parameters

multiclass_mask (torch.Tensor) – Mask from PanNuke, in classification setting. (i.e. nucleus_type_labels=True). Tensor of shape (6, 256, 256).

Returns

Tensor of shape (256, 256).

ML Utils

pathml.ml.utils.center_crop_im_batch(batch, dims, batch_order='BCHW')

Center crop images in a batch.

Parameters
  • batch – The batch of images to be cropped

  • dims – Amount to be cropped (tuple for H, W)

pathml.ml.utils.dice_loss(true, logits, eps=0.001)

Computes the Sørensen–Dice loss. Note that PyTorch optimizers minimize a loss. In this case, we would like to maximize the dice loss so we return 1 - dice loss. From: https://github.com/kevinzakka/pytorch-goodies/blob/c039691f349be9f21527bb38b907a940bfc5e8f3/losses.py#L54

Parameters
  • true – a tensor of shape [B, 1, H, W].

  • logits – a tensor of shape [B, C, H, W]. Corresponds to the raw output or logits of the model.

  • eps – added to the denominator for numerical stability.

Returns

the Sørensen–Dice loss.

Return type

dice_loss

pathml.ml.utils.dice_score(pred, truth, eps=0.001)

Calculate dice score for two tensors of the same shape. If tensors are not already binary, they are converted to bool by zero/non-zero.

Parameters
  • pred (np.ndarray) – Predictions

  • truth (np.ndarray) – ground truth

  • eps (float, optional) – Constant used for numerical stability to avoid divide-by-zero errors. Defaults to 1e-3.

Returns

Dice score

Return type

float

pathml.ml.utils.get_sobel_kernels(size, dt=torch.float32)

Create horizontal and vertical Sobel kernels for approximating gradients Returned kernels will be of shape (size, size)

pathml.ml.utils.wrap_transform_multichannel(transform)

Wrapper to make albumentations transform compatible with a multichannel mask. Channel should be in first dimension, i.e. (n_mask_channels, H, W)

Parameters

transform – Albumentations transform. Must have ‘additional_targets’ parameter specified with a total of n_channels key,value pairs. All values must be ‘mask’ but the keys don’t matter. e.g. for a mask with 3 channels, you could use: additional targets = {‘mask1’ : ‘mask’, ‘mask2’ : ‘mask’, ‘pathml’ : ‘mask’}

Returns

function that can be called with a multichannel mask argument

Miscellaneous Utils

pathml.utils.upsample_array(arr, factor)

Upsample array by a factor. Each element in input array will become a CxC block in the upsampled array, where C is the constant upsampling factor. From https://stackoverflow.com/a/32848377

Parameters
  • arr (np.ndarray) – input array to be upsampled

  • factor (int) – Upsampling factor

Returns

np.ndarray

pathml.utils.pil_to_rgb(image_array_pil)

Convert PIL RGBA Image to numpy RGB array

pathml.utils.segmentation_lines(mask_in)

Generate coords of points bordering segmentations from a given mask. Useful for plotting results of tissue detection or other segmentation.

pathml.utils.plot_mask(im, mask_in, ax=None, color='red', downsample_factor=None)

plot results of segmentation, overlaying on original image_ref

Parameters
  • im (np.ndarray) – Original RGB image_ref

  • mask_in (np.ndarray) – Boolean array of segmentation mask, with True values for masked pixels. Must be same shape as im.

  • ax – Matplotlib axes object to plot on. If None, creates a new plot. Defaults to None.

  • color – Color to plot outlines of mask. Defaults to “red”. Must be recognized by matplotlib.

  • downsample_factor – Downsample factor for image_ref and mask to speed up plotting for big images

pathml.utils.contour_centroid(contour)

Return the centroid of a contour, calculated using moments. From OpenCV implementation

Parameters

contour (np.array) – Contour array as returned by cv2.findContours

Returns

(x, y) coordinates of centroid.

Return type

tuple

pathml.utils.sort_points_clockwise(points)

Sort a list of points into clockwise order around centroid, ordering by angle with centroid and x-axis. After sorting, we can pass the points to cv2 as a contour. Centroid is defined as center of bounding box around points.

Parameters

points (np.ndarray) – Array of points (N x 2)

Returns

Array of points, sorted in order by angle with centroid (N x 2)

Return type

np.ndarray

Return sorted points

pathml.utils.pad_or_crop(array, target_shape)

Make dimensions of input array match target shape by either zero-padding or cropping each axis.

Parameters
  • array (np.ndarray) – Input array

  • target_shape (tuple) – Target shape of output

Returns

Input array cropped/padded to match target_shape

Return type

np.ndarray

pathml.utils.RGB_to_HSI(imarr)

Convert imarr from RGB to HSI colorspace.

Parameters

imarr (np.ndarray) – numpy array of RGB image_ref (m, n, 3)

Returns

numpy array of HSI image_ref (m, n, 3)

Return type

np.ndarray

References

http://eng.usf.edu/~hady/courses/cap5400/rgb-to-hsi.pdf

pathml.utils.RGB_to_OD(imarr)

Convert input image from RGB space to optical density (OD) space. OD = -log(I), where I is the input image in RGB space.

Parameters

imarr (numpy.ndarray) – Image array, RGB format

Returns

Image array, OD format

Return type

numpy.ndarray

pathml.utils.RGB_to_HSV(imarr)

convert image from RGB to HSV

pathml.utils.RGB_to_LAB(imarr)

convert image from RGB to LAB color space

pathml.utils.RGB_to_GREY(imarr)

convert image_ref from RGB to HSV

pathml.utils.normalize_matrix_rows(A)

Normalize the rows of an array.

Parameters

A (np.ndarray) – Input array.

Returns

Array with rows normalized.

Return type

np.ndarray

pathml.utils.normalize_matrix_cols(A)

Normalize the columns of an array.

Parameters

A (np.ndarray) – An array

Returns

Array with columns normalized

Return type

np.ndarray

pathml.utils.plot_segmentation(ax, masks, palette=None, markersize=5)

Plot segmentation contours. Supports multi-class masks.

Parameters
  • ax – matplotlib axis

  • masks (np.ndarray) – Mask array of shape (n_masks, H, W). Zeroes are background pixels.

  • palette – color palette to use. if None, defaults to matplotlib.colors.TABLEAU_COLORS

  • markersize (int) – Size of markers used on plot. Defaults to 5