Utilities API
Documentation for various utilities from all modules.
Logging Utils
- class pathml.PathMLLogger
Convenience methods for turning on or off and configuring logging for PathML. Note that this can also be achieved by interfacing with loguru directly
Example:
from pathml import PathMLLogger as pml # turn on logging for PathML pml.enable() # turn off logging for PathML pml.disable() # turn on logging and output logs to a file named 'logs.txt', with colorization enabled pml.enable(sink="logs.txt", colorize=True)
- static disable()
Turn off logging for PathML
- static enable(sink=sys.stderr, level='DEBUG', fmt='PathML:{level}:{time:HH:mm:ss} | {module}:{function}:{line} | {message}', **kwargs)
Turn on and configure logging for PathML
- Parameters:
sink (str or io._io.TextIOWrapper, optional) – Destination sink for log messages. Defaults to
sys.stderr
.level (str) – level of logs to capture. Defaults to ‘DEBUG’.
fmt (str) – Formatting for the log message. Defaults to: ‘PathML:{level}:{time:HH:mm:ss} | {module}:{function}:{line} | {message}’
**kwargs (dict, optional) – additional options passed to configure logger. See: loguru documentation
Core Utils
- pathml.core.utils.readtupleh5(h5, key)
Read tuple from h5.
- Parameters:
h5 (h5py.Dataset or h5py.Group) – h5 object that will be read from
key (str) – key where data to read is stored
- pathml.core.utils.writedataframeh5(h5, name, df)
Write dataframe as h5 dataset.
- Parameters:
h5 (h5py.Dataset) – root of h5 object that df will be written into
name (str) – name of dataset to be created
df (pd.DataFrame) – dataframe to be written
- pathml.core.utils.writedicth5(h5, name, dic)
Write dict as attributes of h5py.Group.
- Parameters:
h5 (h5py.Dataset) – root of h5 object that dic will be written into
name (str) – name of dataset to be created
dic (str) – dict to be written
- pathml.core.utils.writestringh5(h5, name, st)
Write string as h5 attribute.
- Parameters:
h5 (h5py.Dataset) – root of h5 object that st will be written into
name (str) – name of dataset to be created
st (str) – string to be written
- pathml.core.utils.writetupleh5(h5, name, tup)
Write tuple as h5 attribute.
- Parameters:
h5 (h5py.Dataset) – root of h5 object that tup will be written into
name (str) – name of dataset to be created
tup (str) – tuple to be written
- pathml.core.utils.readcounts(h5)
Read counts using anndata h5py.
- Parameters:
h5 (h5py.Dataset) – h5 object that will be read
- pathml.core.utils.writecounts(h5, counts)
Write counts using anndata h5py.
- Parameters:
h5 (h5py.Dataset) – root of h5 object that counts will be written into
name (str) – name of dataset to be created
tup (anndata.AnnData) – anndata object to be written
Graph Utils
- pathml.graph.utils.Graph(node_centroids, edge_index, node_features=None, node_labels=None, edge_features=None, target=None)
Constructs pytorch-geometric data object for saving and loading
- Parameters:
node_centroids (torch.tensor) – Coordinates of the centers of each entity (cell or tissue) in the graph
node_features (torch.tensor) – Computed features of each entity (cell or tissue) in the graph
edge_index (torch.tensor) – Edge index in sparse format between nodes in the graph
node_labels (torch.tensor) – Node labels of each entity (cell or tissue) in the graph. Defaults to None.
target (torch.tensor) – Target label if used in a supervised setting. Defaults to None.
- pathml.graph.utils.HACTPairData(x_cell, edge_index_cell, x_tissue, edge_index_tissue, assignment, target)
Constructs pytorch-geometric data object for handling both cell and tissue data.
- Parameters:
x_cell (torch.tensor) – Computed features of each cell in the graph
edge_index_cell (torch.tensor) – Edge index in sparse format between nodes in the cell graph
x_tissue (torch.tensor) – Computed features of each tissue in the graph
edge_index_tissue (torch.tensor) – Edge index in sparse format between nodes in the tissue graph
assignment (torch.tensor) – Assigment matrix that contains mapping between cells and tissues.
target (torch.tensor) – Target label if used in a supervised setting.
References
Jaume, G., Pati, P., Anklin, V., Foncubierta, A. and Gabrani, M., 2021, September. Histocartography: A toolkit for graph analytics in digital pathology. In MICCAI Workshop on Computational Pathology (pp. 117-128). PMLR.
- pathml.graph.utils.get_full_instance_map(wsi, patch_size, mask_name='cell')
Generates and returns the normalized image, cell instance map and cell centroids from pathml SlideData object
- Parameters:
wsi (pathml.core.SlideData) – Normalized WSI object with detected cells in the ‘masks’ slot
patch_size (int) – Patch size used for cell detection
mask_name (str) – Name of the mask slot storing the detected cells. Defaults to ‘cell’.
- Returns:
The image in np.unint8 format, the instance map for the entity and the instance centroids for each entity in the instance map as numpy arrays.
- pathml.graph.utils.build_assignment_matrix(low_level_centroids, high_level_map, matrix=False)
Builds an assignment matrix/mapping between low-level centroid locations and a high-level segmentation map
- Parameters:
low_level_centroids (numpy.array) – The low-level centroid coordinates in x-y plane
map (high-level) – The high-level map returned from regionprops
matrix (bool) – Whether to return in a matrix format. If True, returns a N*L matrix where N is the number of low-level instances and L is the number of high-level instances. If False, returns this mapping in sparse format. Defaults to False.
- Returns:
The assignment matrix as a numpy array.
References
[1] https://github.com/BiomedSciAI/histocartography/tree/main [2] Jaume, G., Pati, P., Anklin, V., Foncubierta, A. and Gabrani, M., 2021, September. Histocartography: A toolkit for graph analytics in digital pathology. In MICCAI Workshop on Computational Pathology (pp. 117-128). PMLR.
- pathml.graph.utils.two_hop(edge_index, num_nodes)
Calculates the two-hop graph. :param edge_index: The edge index in sparse form of the graph. :type edge_index: torch.tensor :param num_nodes: maximum number of nodes. :type num_nodes: int
- Returns:
Output edge index tensor.
- Return type:
torch.tensor
References
[1] https://github.com/BiomedSciAI/histocartography/tree/main [2] Jaume, G., Pati, P., Anklin, V., Foncubierta, A. and Gabrani, M., 2021, September. Histocartography: A toolkit for graph analytics in digital pathology. In MICCAI Workshop on Computational Pathology (pp. 117-128). PMLR.
- pathml.graph.utils.two_hop_no_sparse(edge_index, num_nodes)
Calculates the two-hop graph without using sparse tensors, in case of M1/M2 chips. :param edge_index: The edge index in sparse form of the graph (2, E) :type edge_index: torch.tensor :param num_nodes: maximum number of nodes. :type num_nodes: int
- Returns:
Output edge index tensor.
- Return type:
torch.tensor
Datasets Utils
- class pathml.datasets.utils.DeepPatchFeatureExtractor(patch_size, batch_size, architecture, device='cpu', entity='cell', fill_value=255, threshold=0.2, resize_size=224, with_instance_masking=False, extraction_layer=None)
Patch feature extracter of a given architecture and put it on GPU if available using Pathml.datasets.InstanceMapPatchDataset.
- Parameters:
patch_size (int) – Desired size of patch.
batch_size (int) – Desired size of batch.
architecture (str or nn.Module) – String of architecture. According to torchvision.models syntax, or nn.Module class directly.
entity (str) – Entity to be processed. Must be one of ‘cell’ or ‘tissue’. Defaults to ‘cell’.
device (torch.device) – Torch Device used for inference.
fill_value (int) – Value to fill outside the instance maps. Defaults to 255.
threshold (float) – Threshold for processing a patch or not.
resize_size (int) – Desired resized size to input the network. If None, no resizing is done and the patches of size patch_size are provided to the network. Defaults to None.
with_instance_masking (bool) – If pixels outside instance should be masked. Defaults to False.
extraction_layer (str) – Name of the network module from where the features are extracted.
- Returns:
Tensor of features computed for each entity.
- process(input_image, instance_map)
Main processing function that takes in an input image and an instance map and returns features for all entities in the instance map
- pathml.datasets.utils.pannuke_multiclass_mask_to_nucleus_mask(multiclass_mask)
Convert multiclass mask from PanNuke to a single channel nucleus mask. Assumes each pixel is assigned to one and only one class. Sums across channels, except the last mask channel which indicates background pixels in PanNuke. Operates on a single mask.
- Parameters:
multiclass_mask (torch.Tensor) – Mask from PanNuke, in classification setting. (i.e.
nucleus_type_labels=True
). Tensor of shape (6, 256, 256).- Returns:
Tensor of shape (256, 256).
- pathml.datasets.utils._remove_modules(model, last_layer)
Remove all modules in the model that come after a given layer.
- Parameters:
model (nn.Module) – A PyTorch model.
last_layer (str) – Last layer to keep in the model.
- Returns:
Model (nn.Module) without pruned modules.
ML Utils
- pathml.ml.utils.center_crop_im_batch(batch, dims, batch_order='BCHW')
Center crop images in a batch.
- Parameters:
batch – The batch of images to be cropped
dims – Amount to be cropped (tuple for H, W)
- pathml.ml.utils.dice_loss(true, logits, eps=0.001)
Computes the Sørensen–Dice loss. Note that PyTorch optimizers minimize a loss. In this case, we would like to maximize the dice loss so we return 1 - dice loss. From: https://github.com/kevinzakka/pytorch-goodies/blob/c039691f349be9f21527bb38b907a940bfc5e8f3/losses.py#L54
- Parameters:
true – a tensor of shape [B, 1, H, W].
logits – a tensor of shape [B, C, H, W]. Corresponds to the raw output or logits of the model.
eps – added to the denominator for numerical stability.
- Returns:
the Sørensen–Dice loss.
- Return type:
dice_loss
- pathml.ml.utils.dice_score(pred, truth, eps=0.001)
Calculate dice score for two tensors of the same shape. If tensors are not already binary, they are converted to bool by zero/non-zero.
- Parameters:
pred (np.ndarray) – Predictions
truth (np.ndarray) – ground truth
eps (float, optional) – Constant used for numerical stability to avoid divide-by-zero errors. Defaults to 1e-3.
- Returns:
Dice score
- Return type:
float
- pathml.ml.utils.get_sobel_kernels(size, dt=torch.float32)
Create horizontal and vertical Sobel kernels for approximating gradients Returned kernels will be of shape (size, size)
- pathml.ml.utils.wrap_transform_multichannel(transform)
Wrapper to make albumentations transform compatible with a multichannel mask. Channel should be in first dimension, i.e. (n_mask_channels, H, W)
- Parameters:
transform – Albumentations transform. Must have ‘additional_targets’ parameter specified with a total of n_channels key,value pairs. All values must be ‘mask’ but the keys don’t matter. e.g. for a mask with 3 channels, you could use: additional targets = {‘mask1’ : ‘mask’, ‘mask2’ : ‘mask’, ‘pathml’ : ‘mask’}
- Returns:
function that can be called with a multichannel mask argument
- pathml.ml.utils.scatter_sum(src, index, dim, out=None, dim_size=None)
Reduces all values from the
src
tensor intoout
at the indices specified in theindex
tensor along a given axisdim
.For each value in
src
, its output index is specified by its index insrc
for dimensions outside ofdim
and by the corresponding value inindex
for dimensiondim
. The applied reduction is defined via thereduce
argument.- Parameters:
src – The source tensor.
index – The indices of elements to scatter.
dim – The axis along which to index. Default is -1.
out – The destination tensor.
dim_size – If out is not given, automatically create output with size dim_size at dimension dim.
- pathml.ml.utils.broadcast(src, other, dim)
Broadcast tensors to match output tensor dimension.
- pathml.ml.utils.get_degree_histogram(loader, edge_index_str, x_str)
Returns the degree histogram to be used as input for the deg argument in PNAConv.
- pathml.ml.utils.get_class_weights(loader)
Returns the per-class weights to be used in weighted loss functions.
Miscellaneous Utils
- pathml.utils.upsample_array(arr, factor)
Upsample array by a factor. Each element in input array will become a CxC block in the upsampled array, where C is the constant upsampling factor. From https://stackoverflow.com/a/32848377
- Parameters:
arr (np.ndarray) – input array to be upsampled
factor (int) – Upsampling factor
- Returns:
np.ndarray
- pathml.utils.pil_to_rgb(image_array_pil)
Convert PIL RGBA Image to numpy RGB array
- pathml.utils.segmentation_lines(mask_in)
Generate coords of points bordering segmentations from a given mask. Useful for plotting results of tissue detection or other segmentation.
- pathml.utils.plot_mask(im, mask_in, ax=None, color='red', downsample_factor=None)
plot results of segmentation, overlaying on original image_ref
- Parameters:
im (np.ndarray) – Original RGB image_ref
mask_in (np.ndarray) – Boolean array of segmentation mask, with True values for masked pixels. Must be same shape as im.
ax – Matplotlib axes object to plot on. If None, creates a new plot. Defaults to None.
color – Color to plot outlines of mask. Defaults to “red”. Must be recognized by matplotlib.
downsample_factor – Downsample factor for image_ref and mask to speed up plotting for big images
- pathml.utils.contour_centroid(contour)
Return the centroid of a contour, calculated using moments. From OpenCV implementation
- Parameters:
contour (np.array) – Contour array as returned by cv2.findContours
- Returns:
(x, y) coordinates of centroid.
- Return type:
tuple
- pathml.utils.sort_points_clockwise(points)
Sort a list of points into clockwise order around centroid, ordering by angle with centroid and x-axis. After sorting, we can pass the points to cv2 as a contour. Centroid is defined as center of bounding box around points.
- Parameters:
points (np.ndarray) – Array of points (N x 2)
- Returns:
Array of points, sorted in order by angle with centroid (N x 2)
- Return type:
np.ndarray
Return sorted points
- pathml.utils.pad_or_crop(array, target_shape)
Make dimensions of input array match target shape by either zero-padding or cropping each axis.
- Parameters:
array (np.ndarray) – Input array
target_shape (tuple) – Target shape of output
- Returns:
Input array cropped/padded to match target_shape
- Return type:
np.ndarray
- pathml.utils.RGB_to_HSI(imarr)
Convert imarr from RGB to HSI colorspace.
- Parameters:
imarr (np.ndarray) – numpy array of RGB image_ref (m, n, 3)
- Returns:
numpy array of HSI image_ref (m, n, 3)
- Return type:
np.ndarray
References
- pathml.utils.RGB_to_OD(imarr)
Convert input image from RGB space to optical density (OD) space. OD = -log(I), where I is the input image in RGB space.
- Parameters:
imarr (numpy.ndarray) – Image array, RGB format
- Returns:
Image array, OD format
- Return type:
numpy.ndarray
- pathml.utils.RGB_to_HSV(imarr)
convert image from RGB to HSV
- pathml.utils.RGB_to_LAB(imarr)
convert image from RGB to LAB color space
- pathml.utils.RGB_to_GREY(imarr)
convert image_ref from RGB to HSV
- pathml.utils.normalize_matrix_rows(A)
Normalize the rows of an array.
- Parameters:
A (np.ndarray) – Input array.
- Returns:
Array with rows normalized.
- Return type:
np.ndarray
- pathml.utils.normalize_matrix_cols(A)
Normalize the columns of an array.
- Parameters:
A (np.ndarray) – An array
- Returns:
Array with columns normalized
- Return type:
np.ndarray
- pathml.utils.plot_segmentation(ax, masks, palette=None, markersize=5)
Plot segmentation contours. Supports multi-class masks.
- Parameters:
ax – matplotlib axis
masks (np.ndarray) – Mask array of shape (n_masks, H, W). Zeroes are background pixels.
palette – color palette to use. if None, defaults to matplotlib.colors.TABLEAU_COLORS
markersize (int) – Size of markers used on plot. Defaults to 5