ML API
h5path Dataset
- class pathml.ml.TileDataset(file_path)
PyTorch Dataset class for h5path files
Each item is a tuple of (
tile_image
,tile_masks
,tile_labels
,slide_labels
) where:tile_image
is a torch.Tensor of shape (C, H, W) or (T, Z, C, H, W)tile_masks
is a torch.Tensor of shape (n_masks, tile_height, tile_width)tile_labels
is a dictslide_labels
is a dict
This is designed to be wrapped in a PyTorch DataLoader for feeding tiles into ML models.
Note that label dictionaries are not standardized, as users are free to store whatever labels they want. For that reason, PyTorch cannot automatically stack labels into batches. When creating a DataLoader from a TileDataset, it may therefore be necessary to create a custom
collate_fn
to specify how to create batches of labels. See: https://discuss.pytorch.org/t/how-to-use-collate-fn/27181- Parameters
file_path (str) – Path to .h5path file on disk
HoVer-Net
- class pathml.ml.HoVerNet(n_classes=None)
Model for simultaneous segmentation and classification based on HoVer-Net. Can also be used for segmentation only, if class labels are not supplied. Each branch returns logits.
- Parameters
n_classes (int) – Number of classes for classification task. If
None
then the classification branch is not used.
References
Graham, S., Vu, Q.D., Raza, S.E.A., Azam, A., Tsang, Y.W., Kwak, J.T. and Rajpoot, N., 2019. Hover-Net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Medical Image Analysis, 58, p.101563.
- forward(self, inputs)
Helper functions
- pathml.ml.hovernet.compute_hv_map(mask)
Preprocessing step for HoVer-Net architecture. Compute center of mass for each nucleus, then compute distance of each nuclear pixel to its corresponding center of mass. Nuclear pixel distances are normalized to (-1, 1). Background pixels are left as 0. Operates on a single mask. Can be used in Dataset object to make Dataloader compatible with HoVer-Net.
- Parameters
mask (np.ndarray) – Mask indicating individual nuclei. Array of shape (H, W), where each pixel is in {0, …, n} with 0 indicating background pixels and {1, …, n} indicating n unique nuclei.
- Returns
array of hv maps of shape (2, H, W). First channel corresponds to horizontal and second vertical.
- Return type
np.ndarray
- pathml.ml.hovernet.loss_hovernet(outputs, ground_truth, n_classes=None)
Compute loss for HoVer-Net. Equation (1) in Graham et al.
- Parameters
outputs –
Output of HoVer-Net. Should be a list of [np, hv] if n_classes is None, or a list of [np, hv, nc] if n_classes is not None. Shapes of each should be:
np: (B, 2, H, W)
hv: (B, 2, H, W)
nc: (B, n_classes, H, W)
ground_truth – True labels. Should be a list of [mask, hv], where mask is a Tensor of shape (B, 1, H, W) if n_classes is
None
or (B, n_classes, H, W) if n_classes is notNone
. hv is a tensor of precomputed horizontal and vertical distances of nuclear pixels to their corresponding centers of mass, and is of shape (B, 2, H, W).n_classes (int) – Number of classes for classification task. If
None
then the classification branch is not used.
References
Graham, S., Vu, Q.D., Raza, S.E.A., Azam, A., Tsang, Y.W., Kwak, J.T. and Rajpoot, N., 2019. Hover-Net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Medical Image Analysis, 58, p.101563.
- pathml.ml.hovernet.remove_small_objs(array_in, min_size)
Removes small foreground regions from binary array, leaving only the contiguous regions which are above the size threshold. Pixels in regions below the size threshold are zeroed out.
- Parameters
array_in (np.ndarray) – Input array. Must be binary array with dtype=np.uint8.
min_size (int) – Minimum size of each region.
- Returns
- Array of labels for regions above the threshold. Each separate contiguous region is labelled with
a different integer from 1 to n, where n is the number of total distinct contiguous regions
- Return type
np.ndarray
- pathml.ml.hovernet.post_process_batch_hovernet(outputs, n_classes, small_obj_size_thresh=10, kernel_size=21, h=0.5, k=0.5)
Post-process HoVer-Net outputs to get a final predicted mask. See: Section B of HoVer-Net article and https://github.com/vqdang/hover_net/blob/14c5996fa61ede4691e87905775e8f4243da6a62/models/hovernet/post_proc.py#L27
- Parameters
outputs (list) –
Outputs of HoVer-Net model. List of [np_out, hv_out], or [np_out, hv_out, nc_out] depending on whether model is predicting classification or not.
np_out is a Tensor of shape (B, 2, H, W) of logit predictions for binary classification
hv_out is a Tensor of shape (B, 2, H, W) of predictions for horizontal/vertical maps
nc_out is a Tensor of shape (B, n_classes, H, W) of logits for classification
n_classes (int) – Number of classes for classification task. If
None
then only segmentation is performed.small_obj_size_thresh (int) – Minimum number of pixels in regions. Defaults to 10.
kernel_size (int) – Width of Sobel kernel used to compute horizontal and vertical gradients.
h (float) – hyperparameter for thresholding nucleus probabilities. Defaults to 0.5.
k (float) – hyperparameter for thresholding energy landscape to create markers for watershed segmentation. Defaults to 0.5.
- Returns
If n_classes is None, returns det_out. In classification setting, returns (det_out, class_out).
det_out is np.ndarray of shape (B, H, W)
class_out is np.ndarray of shape (B, n_classes, H, W)
Each pixel is labelled from 0 to n, where n is the number of individual nuclei detected. 0 pixels indicate background. Pixel values i indicate that the pixel belongs to the ith nucleus.
- Return type
np.ndarray