ML API

h5path Dataset

class pathml.ml.TileDataset(file_path)

PyTorch Dataset class for h5path files

Each item is a tuple of (tile_image, tile_masks, tile_labels, slide_labels) where:

  • tile_image is a torch.Tensor of shape (C, H, W) or (T, Z, C, H, W)

  • tile_masks is a torch.Tensor of shape (n_masks, tile_height, tile_width)

  • tile_labels is a dict

  • slide_labels is a dict

This is designed to be wrapped in a PyTorch DataLoader for feeding tiles into ML models.

Note that label dictionaries are not standardized, as users are free to store whatever labels they want. For that reason, PyTorch cannot automatically stack labels into batches. When creating a DataLoader from a TileDataset, it may therefore be necessary to create a custom collate_fn to specify how to create batches of labels. See: https://discuss.pytorch.org/t/how-to-use-collate-fn/27181

Parameters

file_path (str) – Path to .h5path file on disk

HoVer-Net

class pathml.ml.HoVerNet(n_classes=None)

Model for simultaneous segmentation and classification based on HoVer-Net. Can also be used for segmentation only, if class labels are not supplied. Each branch returns logits.

Parameters

n_classes (int) – Number of classes for classification task. If None then the classification branch is not used.

References

Graham, S., Vu, Q.D., Raza, S.E.A., Azam, A., Tsang, Y.W., Kwak, J.T. and Rajpoot, N., 2019. Hover-Net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Medical Image Analysis, 58, p.101563.

forward(self, inputs)

Helper functions

pathml.ml.hovernet.compute_hv_map(mask)

Preprocessing step for HoVer-Net architecture. Compute center of mass for each nucleus, then compute distance of each nuclear pixel to its corresponding center of mass. Nuclear pixel distances are normalized to (-1, 1). Background pixels are left as 0. Operates on a single mask. Can be used in Dataset object to make Dataloader compatible with HoVer-Net.

Based on https://github.com/vqdang/hover_net/blob/195ed9b6cc67b12f908285492796fb5c6c15a000/src/loader/augs.py#L192

Parameters

mask (np.ndarray) – Mask indicating individual nuclei. Array of shape (H, W), where each pixel is in {0, …, n} with 0 indicating background pixels and {1, …, n} indicating n unique nuclei.

Returns

array of hv maps of shape (2, H, W). First channel corresponds to horizontal and second vertical.

Return type

np.ndarray

pathml.ml.hovernet.loss_hovernet(outputs, ground_truth, n_classes=None)

Compute loss for HoVer-Net. Equation (1) in Graham et al.

Parameters
  • outputs

    Output of HoVer-Net. Should be a list of [np, hv] if n_classes is None, or a list of [np, hv, nc] if n_classes is not None. Shapes of each should be:

    • np: (B, 2, H, W)

    • hv: (B, 2, H, W)

    • nc: (B, n_classes, H, W)

  • ground_truth – True labels. Should be a list of [mask, hv], where mask is a Tensor of shape (B, 1, H, W) if n_classes is None or (B, n_classes, H, W) if n_classes is not None. hv is a tensor of precomputed horizontal and vertical distances of nuclear pixels to their corresponding centers of mass, and is of shape (B, 2, H, W).

  • n_classes (int) – Number of classes for classification task. If None then the classification branch is not used.

References

Graham, S., Vu, Q.D., Raza, S.E.A., Azam, A., Tsang, Y.W., Kwak, J.T. and Rajpoot, N., 2019. Hover-Net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Medical Image Analysis, 58, p.101563.

pathml.ml.hovernet.remove_small_objs(array_in, min_size)

Removes small foreground regions from binary array, leaving only the contiguous regions which are above the size threshold. Pixels in regions below the size threshold are zeroed out.

Parameters
  • array_in (np.ndarray) – Input array. Must be binary array with dtype=np.uint8.

  • min_size (int) – Minimum size of each region.

Returns

Array of labels for regions above the threshold. Each separate contiguous region is labelled with

a different integer from 1 to n, where n is the number of total distinct contiguous regions

Return type

np.ndarray

pathml.ml.hovernet.post_process_batch_hovernet(outputs, n_classes, small_obj_size_thresh=10, kernel_size=21, h=0.5, k=0.5)

Post-process HoVer-Net outputs to get a final predicted mask. See: Section B of HoVer-Net article and https://github.com/vqdang/hover_net/blob/14c5996fa61ede4691e87905775e8f4243da6a62/models/hovernet/post_proc.py#L27

Parameters
  • outputs (list) –

    Outputs of HoVer-Net model. List of [np_out, hv_out], or [np_out, hv_out, nc_out] depending on whether model is predicting classification or not.

    • np_out is a Tensor of shape (B, 2, H, W) of logit predictions for binary classification

    • hv_out is a Tensor of shape (B, 2, H, W) of predictions for horizontal/vertical maps

    • nc_out is a Tensor of shape (B, n_classes, H, W) of logits for classification

  • n_classes (int) – Number of classes for classification task. If None then only segmentation is performed.

  • small_obj_size_thresh (int) – Minimum number of pixels in regions. Defaults to 10.

  • kernel_size (int) – Width of Sobel kernel used to compute horizontal and vertical gradients.

  • h (float) – hyperparameter for thresholding nucleus probabilities. Defaults to 0.5.

  • k (float) – hyperparameter for thresholding energy landscape to create markers for watershed segmentation. Defaults to 0.5.

Returns

If n_classes is None, returns det_out. In classification setting, returns (det_out, class_out).

  • det_out is np.ndarray of shape (B, H, W)

  • class_out is np.ndarray of shape (B, n_classes, H, W)

Each pixel is labelled from 0 to n, where n is the number of individual nuclei detected. 0 pixels indicate background. Pixel values i indicate that the pixel belongs to the ith nucleus.

Return type

np.ndarray