Utilities API

Documentation for various utilities from all modules.

Logging Utils

class pathml.PathMLLogger

Convenience methods for turning on or off and configuring logging for PathML. Note that this can also be achieved by interfacing with loguru directly

Example:

from pathml import PathMLLogger as pml

# turn on logging for PathML
pml.enable()

# turn off logging for PathML
pml.disable()

# turn on logging and output logs to a file named 'logs.txt', with colorization enabled
pml.enable(sink="logs.txt", colorize=True)
static disable()

Turn off logging for PathML

static enable(sink=sys.stderr, level='DEBUG', fmt='PathML:{level}:{time:HH:mm:ss} | {module}:{function}:{line} | {message}', **kwargs)

Turn on and configure logging for PathML

Parameters:
  • sink (str or io._io.TextIOWrapper, optional) – Destination sink for log messages. Defaults to sys.stderr.

  • level (str) – level of logs to capture. Defaults to ‘DEBUG’.

  • fmt (str) – Formatting for the log message. Defaults to: ‘PathML:{level}:{time:HH:mm:ss} | {module}:{function}:{line} | {message}’

  • **kwargs (dict, optional) – additional options passed to configure logger. See: loguru documentation

Core Utils

pathml.core.utils.readtupleh5(h5, key)

Read tuple from h5.

Parameters:
  • h5 (h5py.Dataset or h5py.Group) – h5 object that will be read from

  • key (str) – key where data to read is stored

pathml.core.utils.writedataframeh5(h5, name, df)

Write dataframe as h5 dataset.

Parameters:
  • h5 (h5py.Dataset) – root of h5 object that df will be written into

  • name (str) – name of dataset to be created

  • df (pd.DataFrame) – dataframe to be written

pathml.core.utils.writedicth5(h5, name, dic)

Write dict as attributes of h5py.Group.

Parameters:
  • h5 (h5py.Dataset) – root of h5 object that dic will be written into

  • name (str) – name of dataset to be created

  • dic (str) – dict to be written

pathml.core.utils.writestringh5(h5, name, st)

Write string as h5 attribute.

Parameters:
  • h5 (h5py.Dataset) – root of h5 object that st will be written into

  • name (str) – name of dataset to be created

  • st (str) – string to be written

pathml.core.utils.writetupleh5(h5, name, tup)

Write tuple as h5 attribute.

Parameters:
  • h5 (h5py.Dataset) – root of h5 object that tup will be written into

  • name (str) – name of dataset to be created

  • tup (str) – tuple to be written

pathml.core.utils.readcounts(h5)

Read counts using anndata h5py.

Parameters:

h5 (h5py.Dataset) – h5 object that will be read

pathml.core.utils.writecounts(h5, counts)

Write counts using anndata h5py.

Parameters:
  • h5 (h5py.Dataset) – root of h5 object that counts will be written into

  • name (str) – name of dataset to be created

  • tup (anndata.AnnData) – anndata object to be written

Graph Utils

pathml.graph.utils.Graph(node_centroids, edge_index, node_features=None, node_labels=None, edge_features=None, target=None)

Constructs pytorch-geometric data object for saving and loading

Parameters:
  • node_centroids (torch.tensor) – Coordinates of the centers of each entity (cell or tissue) in the graph

  • node_features (torch.tensor) – Computed features of each entity (cell or tissue) in the graph

  • edge_index (torch.tensor) – Edge index in sparse format between nodes in the graph

  • node_labels (torch.tensor) – Node labels of each entity (cell or tissue) in the graph. Defaults to None.

  • target (torch.tensor) – Target label if used in a supervised setting. Defaults to None.

pathml.graph.utils.HACTPairData(x_cell, edge_index_cell, x_tissue, edge_index_tissue, assignment, target)

Constructs pytorch-geometric data object for handling both cell and tissue data.

Parameters:
  • x_cell (torch.tensor) – Computed features of each cell in the graph

  • edge_index_cell (torch.tensor) – Edge index in sparse format between nodes in the cell graph

  • x_tissue (torch.tensor) – Computed features of each tissue in the graph

  • edge_index_tissue (torch.tensor) – Edge index in sparse format between nodes in the tissue graph

  • assignment (torch.tensor) – Assigment matrix that contains mapping between cells and tissues.

  • target (torch.tensor) – Target label if used in a supervised setting.

References

Jaume, G., Pati, P., Anklin, V., Foncubierta, A. and Gabrani, M., 2021, September. Histocartography: A toolkit for graph analytics in digital pathology. In MICCAI Workshop on Computational Pathology (pp. 117-128). PMLR.

pathml.graph.utils.get_full_instance_map(wsi, patch_size, mask_name='cell')

Generates and returns the normalized image, cell instance map and cell centroids from pathml SlideData object

Parameters:
  • wsi (pathml.core.SlideData) – Normalized WSI object with detected cells in the ‘masks’ slot

  • patch_size (int) – Patch size used for cell detection

  • mask_name (str) – Name of the mask slot storing the detected cells. Defaults to ‘cell’.

Returns:

The image in np.unint8 format, the instance map for the entity and the instance centroids for each entity in the instance map as numpy arrays.

pathml.graph.utils.build_assignment_matrix(low_level_centroids, high_level_map, matrix=False)

Builds an assignment matrix/mapping between low-level centroid locations and a high-level segmentation map

Parameters:
  • low_level_centroids (numpy.array) – The low-level centroid coordinates in x-y plane

  • map (high-level) – The high-level map returned from regionprops

  • matrix (bool) – Whether to return in a matrix format. If True, returns a N*L matrix where N is the number of low-level instances and L is the number of high-level instances. If False, returns this mapping in sparse format. Defaults to False.

Returns:

The assignment matrix as a numpy array.

References

[1] https://github.com/BiomedSciAI/histocartography/tree/main [2] Jaume, G., Pati, P., Anklin, V., Foncubierta, A. and Gabrani, M., 2021, September. Histocartography: A toolkit for graph analytics in digital pathology. In MICCAI Workshop on Computational Pathology (pp. 117-128). PMLR.

pathml.graph.utils.two_hop(edge_index, num_nodes)

Calculates the two-hop graph. :param edge_index: The edge index in sparse form of the graph. :type edge_index: torch.tensor :param num_nodes: maximum number of nodes. :type num_nodes: int

Returns:

Output edge index tensor.

Return type:

torch.tensor

References

[1] https://github.com/BiomedSciAI/histocartography/tree/main [2] Jaume, G., Pati, P., Anklin, V., Foncubierta, A. and Gabrani, M., 2021, September. Histocartography: A toolkit for graph analytics in digital pathology. In MICCAI Workshop on Computational Pathology (pp. 117-128). PMLR.

pathml.graph.utils.two_hop_no_sparse(edge_index, num_nodes)

Calculates the two-hop graph without using sparse tensors, in case of M1/M2 chips. :param edge_index: The edge index in sparse form of the graph (2, E) :type edge_index: torch.tensor :param num_nodes: maximum number of nodes. :type num_nodes: int

Returns:

Output edge index tensor.

Return type:

torch.tensor

Datasets Utils

class pathml.datasets.utils.DeepPatchFeatureExtractor(patch_size, batch_size, architecture, device='cpu', entity='cell', fill_value=255, threshold=0.2, resize_size=224, with_instance_masking=False, extraction_layer=None)

Patch feature extracter of a given architecture and put it on GPU if available using Pathml.datasets.InstanceMapPatchDataset.

Parameters:
  • patch_size (int) – Desired size of patch.

  • batch_size (int) – Desired size of batch.

  • architecture (str or nn.Module) – String of architecture. According to torchvision.models syntax, or nn.Module class directly.

  • entity (str) – Entity to be processed. Must be one of ‘cell’ or ‘tissue’. Defaults to ‘cell’.

  • device (torch.device) – Torch Device used for inference.

  • fill_value (int) – Value to fill outside the instance maps. Defaults to 255.

  • threshold (float) – Threshold for processing a patch or not.

  • resize_size (int) – Desired resized size to input the network. If None, no resizing is done and the patches of size patch_size are provided to the network. Defaults to None.

  • with_instance_masking (bool) – If pixels outside instance should be masked. Defaults to False.

  • extraction_layer (str) – Name of the network module from where the features are extracted.

Returns:

Tensor of features computed for each entity.

process(input_image, instance_map)

Main processing function that takes in an input image and an instance map and returns features for all entities in the instance map

pathml.datasets.utils.pannuke_multiclass_mask_to_nucleus_mask(multiclass_mask)

Convert multiclass mask from PanNuke to a single channel nucleus mask. Assumes each pixel is assigned to one and only one class. Sums across channels, except the last mask channel which indicates background pixels in PanNuke. Operates on a single mask.

Parameters:

multiclass_mask (torch.Tensor) – Mask from PanNuke, in classification setting. (i.e. nucleus_type_labels=True). Tensor of shape (6, 256, 256).

Returns:

Tensor of shape (256, 256).

pathml.datasets.utils._remove_modules(model, last_layer)

Remove all modules in the model that come after a given layer.

Parameters:
  • model (nn.Module) – A PyTorch model.

  • last_layer (str) – Last layer to keep in the model.

Returns:

Model (nn.Module) without pruned modules.

ML Utils

pathml.ml.utils.center_crop_im_batch(batch, dims, batch_order='BCHW')

Center crop images in a batch.

Parameters:
  • batch – The batch of images to be cropped

  • dims – Amount to be cropped (tuple for H, W)

pathml.ml.utils.dice_loss(true, logits, eps=0.001)

Computes the Sørensen–Dice loss. Note that PyTorch optimizers minimize a loss. In this case, we would like to maximize the dice loss so we return 1 - dice loss. From: https://github.com/kevinzakka/pytorch-goodies/blob/c039691f349be9f21527bb38b907a940bfc5e8f3/losses.py#L54

Parameters:
  • true – a tensor of shape [B, 1, H, W].

  • logits – a tensor of shape [B, C, H, W]. Corresponds to the raw output or logits of the model.

  • eps – added to the denominator for numerical stability.

Returns:

the Sørensen–Dice loss.

Return type:

dice_loss

pathml.ml.utils.dice_score(pred, truth, eps=0.001)

Calculate dice score for two tensors of the same shape. If tensors are not already binary, they are converted to bool by zero/non-zero.

Parameters:
  • pred (np.ndarray) – Predictions

  • truth (np.ndarray) – ground truth

  • eps (float, optional) – Constant used for numerical stability to avoid divide-by-zero errors. Defaults to 1e-3.

Returns:

Dice score

Return type:

float

pathml.ml.utils.get_sobel_kernels(size, dt=torch.float32)

Create horizontal and vertical Sobel kernels for approximating gradients Returned kernels will be of shape (size, size)

pathml.ml.utils.wrap_transform_multichannel(transform)

Wrapper to make albumentations transform compatible with a multichannel mask. Channel should be in first dimension, i.e. (n_mask_channels, H, W)

Parameters:

transform – Albumentations transform. Must have ‘additional_targets’ parameter specified with a total of n_channels key,value pairs. All values must be ‘mask’ but the keys don’t matter. e.g. for a mask with 3 channels, you could use: additional targets = {‘mask1’ : ‘mask’, ‘mask2’ : ‘mask’, ‘pathml’ : ‘mask’}

Returns:

function that can be called with a multichannel mask argument

pathml.ml.utils.scatter_sum(src, index, dim, out=None, dim_size=None)

Reduces all values from the src tensor into out at the indices specified in the index tensor along a given axis dim.

For each value in src, its output index is specified by its index in src for dimensions outside of dim and by the corresponding value in index for dimension dim. The applied reduction is defined via the reduce argument.

Parameters:
  • src – The source tensor.

  • index – The indices of elements to scatter.

  • dim – The axis along which to index. Default is -1.

  • out – The destination tensor.

  • dim_size – If out is not given, automatically create output with size dim_size at dimension dim.

Reference:

https://pytorch-scatter.readthedocs.io/en/latest/_modules/torch_scatter/scatter.html#scatter

pathml.ml.utils.broadcast(src, other, dim)

Broadcast tensors to match output tensor dimension.

pathml.ml.utils.get_degree_histogram(loader, edge_index_str, x_str)

Returns the degree histogram to be used as input for the deg argument in PNAConv.

pathml.ml.utils.get_class_weights(loader)

Returns the per-class weights to be used in weighted loss functions.

Miscellaneous Utils

pathml.utils.upsample_array(arr, factor)

Upsample array by a factor. Each element in input array will become a CxC block in the upsampled array, where C is the constant upsampling factor. From https://stackoverflow.com/a/32848377

Parameters:
  • arr (np.ndarray) – input array to be upsampled

  • factor (int) – Upsampling factor

Returns:

np.ndarray

pathml.utils.pil_to_rgb(image_array_pil)

Convert PIL RGBA Image to numpy RGB array

pathml.utils.segmentation_lines(mask_in)

Generate coords of points bordering segmentations from a given mask. Useful for plotting results of tissue detection or other segmentation.

pathml.utils.plot_mask(im, mask_in, ax=None, color='red', downsample_factor=None)

plot results of segmentation, overlaying on original image_ref

Parameters:
  • im (np.ndarray) – Original RGB image_ref

  • mask_in (np.ndarray) – Boolean array of segmentation mask, with True values for masked pixels. Must be same shape as im.

  • ax – Matplotlib axes object to plot on. If None, creates a new plot. Defaults to None.

  • color – Color to plot outlines of mask. Defaults to “red”. Must be recognized by matplotlib.

  • downsample_factor – Downsample factor for image_ref and mask to speed up plotting for big images

pathml.utils.contour_centroid(contour)

Return the centroid of a contour, calculated using moments. From OpenCV implementation

Parameters:

contour (np.array) – Contour array as returned by cv2.findContours

Returns:

(x, y) coordinates of centroid.

Return type:

tuple

pathml.utils.sort_points_clockwise(points)

Sort a list of points into clockwise order around centroid, ordering by angle with centroid and x-axis. After sorting, we can pass the points to cv2 as a contour. Centroid is defined as center of bounding box around points.

Parameters:

points (np.ndarray) – Array of points (N x 2)

Returns:

Array of points, sorted in order by angle with centroid (N x 2)

Return type:

np.ndarray

Return sorted points

pathml.utils.pad_or_crop(array, target_shape)

Make dimensions of input array match target shape by either zero-padding or cropping each axis.

Parameters:
  • array (np.ndarray) – Input array

  • target_shape (tuple) – Target shape of output

Returns:

Input array cropped/padded to match target_shape

Return type:

np.ndarray

pathml.utils.RGB_to_HSI(imarr)

Convert imarr from RGB to HSI colorspace.

Parameters:

imarr (np.ndarray) – numpy array of RGB image_ref (m, n, 3)

Returns:

numpy array of HSI image_ref (m, n, 3)

Return type:

np.ndarray

References

http://eng.usf.edu/~hady/courses/cap5400/rgb-to-hsi.pdf

pathml.utils.RGB_to_OD(imarr)

Convert input image from RGB space to optical density (OD) space. OD = -log(I), where I is the input image in RGB space.

Parameters:

imarr (numpy.ndarray) – Image array, RGB format

Returns:

Image array, OD format

Return type:

numpy.ndarray

pathml.utils.RGB_to_HSV(imarr)

convert image from RGB to HSV

pathml.utils.RGB_to_LAB(imarr)

convert image from RGB to LAB color space

pathml.utils.RGB_to_GREY(imarr)

convert image_ref from RGB to HSV

pathml.utils.normalize_matrix_rows(A)

Normalize the rows of an array.

Parameters:

A (np.ndarray) – Input array.

Returns:

Array with rows normalized.

Return type:

np.ndarray

pathml.utils.normalize_matrix_cols(A)

Normalize the columns of an array.

Parameters:

A (np.ndarray) – An array

Returns:

Array with columns normalized

Return type:

np.ndarray

pathml.utils.plot_segmentation(ax, masks, palette=None, markersize=5)

Plot segmentation contours. Supports multi-class masks.

Parameters:
  • ax – matplotlib axis

  • masks (np.ndarray) – Mask array of shape (n_masks, H, W). Zeroes are background pixels.

  • palette – color palette to use. if None, defaults to matplotlib.colors.TABLEAU_COLORS

  • markersize (int) – Size of markers used on plot. Defaults to 5