Preprocessing API

Pipeline

class pathml.preprocessing.Pipeline(transform_sequence=None)

Compose a sequence of Transforms

Parameters

transform_sequence (list) – sequence of transforms to be consecutively applied. List of pathml.core.Transform objects

apply(self, tile)

modify Tile object in-place

save(self, filename)

save pipeline to disk

Parameters

filename (str) – save path on disk

Transforms

class pathml.preprocessing.MedianBlur(kernel_size=5)

Median blur kernel.

Parameters

kernel_size (int) – Width of kernel. Must be an odd number. Defaults to 5.

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.GaussianBlur(kernel_size=5, sigma=5)

Gaussian blur kernel.

Parameters
  • kernel_size (int) – Width of kernel. Must be an odd number. Defaults to 5.

  • sigma (float) – Variance of Gaussian kernel. Variance is assumed to be equal in X and Y axes. Defaults to 5.

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.BoxBlur(kernel_size=5)

Box (average) blur kernel.

Parameters

kernel_size (int) – Width of kernel. Defaults to 5.

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.BinaryThreshold(mask_name=None, use_otsu=True, threshold=0, inverse=False)

Binary thresholding transform to create a binary mask. If input image is RGB it is first converted to greyscale, otherwise the input must have 1 channel.

Parameters
  • mask_name (str) – Name of mask that is created.

  • use_otsu (bool) – Whether to use Otsu’s method to automatically determine optimal threshold. Defaults to True.

  • threshold (int) – Specified threshold. Ignored if use_otsu is True. Defaults to 0.

  • inverse (bool) – Whether to use inverse threshold. If using inverse threshold, pixels below the threshold will be returned as 1. Otherwise pixels below the threshold will be returned as 0. Defaults to False.

References

Otsu, N., 1979. A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics, 9(1), pp.62-66.

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.MorphOpen(mask_name=None, kernel_size=5, n_iterations=1)

Morphological opening. First applies erosion operation, then dilation. Reduces noise by removing small objects from the background. Operates on a binary mask.

Parameters
  • mask_name (str) – Name of mask on which to apply transform

  • kernel_size (int) – Size of kernel for default square kernel. Ignored if a custom kernel is specified. Defaults to 5.

  • n_iterations (int) – Number of opening operations to perform. Defaults to 1.

F(self, mask)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.MorphClose(mask_name=None, kernel_size=5, n_iterations=1)

Morphological closing. First applies dilation operation, then erosion. Reduces noise by closing small holes in the foreground. Operates on a binary mask.

Parameters
  • mask_name (str) – Name of mask on which to apply transform

  • kernel_size (int) – Size of kernel for default square kernel. Ignored if a custom kernel is specified. Defaults to 5.

  • n_iterations (int) – Number of opening operations to perform. Defaults to 1.

F(self, mask)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.ForegroundDetection(mask_name=None, min_region_size=5000, max_hole_size=1500, outer_contours_only=False)

Foreground detection for binary masks. Identifies regions that have a total area greater than specified threshold. Supports including holes within foreground regions, or excluding holes above a specified area threshold.

Parameters
  • min_region_size (int) – Minimum area of detected foreground regions, in pixels. Defaults to 5000.

  • max_hole_size (int) – Maximum size of allowed holes in foreground regions, in pixels. Ignored if outer_contours_only is True. Defaults to 1500.

  • outer_contours_only (bool) – If true, ignore holes in detected foreground regions. Defaults to False.

  • mask_name (str) – Name of mask on which to apply transform

References

Lu, M.Y., Williamson, D.F., Chen, T.Y., Chen, R.J., Barbieri, M. and Mahmood, F., 2020. Data Efficient and Weakly Supervised Computational Pathology on Whole Slide Images. arXiv preprint arXiv:2004.09666.

F(self, mask)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.SuperpixelInterpolation(region_size=10, n_iter=30)

Divide input image into superpixels using SLIC algorithm, then interpolate each superpixel with average color. SLIC superpixel algorithm described in Achanta et al. 2012.

Parameters
  • region_size (int) – region_size parameter used for superpixel creation. Defaults to 10.

  • n_iter (int) – Number of iterations to run SLIC algorithm. Defaults to 30.

References

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P. and Süsstrunk, S., 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE transactions on pattern analysis and machine intelligence, 34(11), pp.2274-2282.

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.StainNormalizationHE(target='normalize', stain_estimation_method='macenko', optical_density_threshold=0.15, sparsity_regularizer=1.0, angular_percentile=0.01, regularizer_lasso=0.01, background_intensity=245, stain_matrix_target_od=np.array([[0.5626, 0.2159], [0.7201, 0.8012], [0.4062, 0.5581]]), max_c_target=np.array([1.9705, 1.0308]))

Normalize H&E stained images to a reference slide. Also can be used to separate hematoxylin and eosin channels.

H&E images are assumed to be composed of two stains, each one having a vector of its characteristic RGB values. The stain matrix is a 3x2 matrix where the first column corresponds to the hematoxylin stain vector and the second corresponds to eosin stain vector. The stain matrix can be estimated from a reference image in a number of ways; here we provide implementations of two such algorithms from Macenko et al. and Vahadane et al.

After estimating the stain matrix for an image, the next step is to assign stain concentrations to each pixel. Each pixel is assumed to be a linear combination of the two stain vectors, where the coefficients are the intensities of each stain vector at that pixel. To solve for the intensities, we use least squares in Macenko method and lasso in vahadane method.

The image can then be reconstructed by applying those pixel intensities to a stain matrix. This allows you to standardize the appearance of an image by reconstructing it using a reference stain matrix. Using this method of normalization may help account for differences in slide appearance arising from variations in staining procedure, differences between scanners, etc. Images can also be reconstructed using only a single stain vector, e.g. to separate the hematoxylin and eosin channels of an H&E image.

This code is based in part on StainTools: https://github.com/Peter554/StainTools

Parameters
  • target (str) – one of ‘normalize’, ‘hematoxylin’, or ‘eosin’. Defaults to ‘normalize’

  • stain_estimation_method (str) – method for estimating stain matrix. Must be one of ‘macenko’ or ‘vahadane’. Defaults to ‘macenko’.

  • optical_density_threshold (float) – Threshold for removing low-optical density pixels when estimating stain vectors. Defaults to 0.15

  • sparsity_regularizer (float) – Regularization parameter for dictionary learning when estimating stain vector using vahadane method. Ignored if concentration_estimation_method != 'vahadane'. Defaults to 1.0

  • angular_percentile (float) – Percentile for stain vector selection when estimating stain vector using Macenko method. Ignored if concentration_estimation_method != 'macenko'. Defaults to 0.01

  • regularizer_lasso (float) – regularization parameter for lasso solver. Defaults to 0.01. Ignored if method != 'lasso'

  • background_intensity (int) – Intensity of background light. Must be an integer between 0 and 255. Defaults to 245.

  • stain_matrix_target_od (np.ndarray) – Stain matrix for reference slide. Matrix of H and E stain vectors in optical density (OD) space. Stain matrix is (3, 2) and first column corresponds to hematoxylin. Default stain matrix can be used, or you can also fit to a reference slide of your choosing by calling fit_to_reference().

  • max_c_target (np.ndarray) – Maximum concentrations of each stain in reference slide. Default can be used, or you can also fit to a reference slide of your choosing by calling fit_to_reference().

Note

If using stain_estimation_method = "Vahadane", spams must be installed, along with all of its dependencies (i.e. libblas & liblapack).

References

Macenko, M., Niethammer, M., Marron, J.S., Borland, D., Woosley, J.T., Guan, X., Schmitt, C. and Thomas, N.E., 2009, June. A method for normalizing histology slides for quantitative analysis. In 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro (pp. 1107-1110). IEEE.

Vahadane, A., Peng, T., Sethi, A., Albarqouni, S., Wang, L., Baust, M., Steiger, K., Schlitter, A.M., Esposito, I. and Navab, N., 2016. Structure-preserving color normalization and sparse stain separation for histological images. IEEE transactions on medical imaging, 35(8), pp.1962-1971.

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

fit_to_reference(self, image_ref)

Fit stain_matrix and max_c to a reference slide. This allows you to use a specific slide as the reference for stain normalization. Works by first estimating stain matrix from input reference image, then estimating pixel concentrations. Newly computed stain matrix and maximum concentrations are then used for any future color normalization.

Parameters

image_ref (np.ndarray) – RGB reference image

class pathml.preprocessing.NucleusDetectionHE(mask_name=None, stain_estimation_method='vahadane', superpixel_region_size=10, n_iter=30, **stain_kwargs)

Simple nucleus detection algorithm for H&E stained images. Works by first separating hematoxylin channel, then doing interpolation using superpixels, and finally using Otsu’s method for binary thresholding.

Parameters
  • stain_estimation_method (str) – Method for estimating stain matrix. Defaults to “vahadane”

  • superpixel_region_size (int) – region_size parameter used for superpixel creation. Defaults to 10.

  • n_iter (int) – Number of iterations to run SLIC superpixel algorithm. Defaults to 30.

  • mask_name (str) – Name of mask that is created.

  • stain_kwargs (dict) – other arguments passed to StainNormalizationHE()

References

Hu, B., Tang, Y., Eric, I., Chang, C., Fan, Y., Lai, M. and Xu, Y., 2018. Unsupervised learning for cell-level visual representation in histopathology images with generative adversarial networks. IEEE journal of biomedical and health informatics, 23(3), pp.1316-1328.

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.TissueDetectionHE(mask_name=None, use_saturation=True, blur_ksize=17, threshold=None, morph_n_iter=3, morph_k_size=7, min_region_size=5000, max_hole_size=1500, outer_contours_only=False)

Detect tissue regions from H&E stained slide. First applies a median blur, then binary thresholding, then morphological opening and closing, and finally foreground detection.

Parameters
  • use_saturation (bool) – Whether to convert to HSV and use saturation channel for tissue detection. If False, convert from RGB to greyscale and use greyscale image_ref for tissue detection. Defaults to True.

  • blur_ksize (int) – kernel size used to apply median blurring. Defaults to 15.

  • threshold (int) – threshold for binary thresholding. If None, uses Otsu’s method. Defaults to None.

  • morph_n_iter (int) – number of iterations of morphological opening and closing to apply. Defaults to 3.

  • morph_k_size (int) – kernel size for morphological opening and closing. Defaults to 7.

  • min_region_size (int) – Minimum area of detected foreground regions, in pixels. Defaults to 5000.

  • max_hole_size (int) – Maximum size of allowed holes in foreground regions, in pixels. Ignored if outer_contours_only=True. Defaults to 1500.

  • outer_contours_only (bool) – If true, ignore holes in detected foreground regions. Defaults to False.

  • mask_name (str) – name for new mask

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.LabelArtifactTileHE(label_name=None)

Applies a rule-based method to identify whether or not an image contains artifacts (e.g. pen marks). Based on criteria from Kothari et al. 2012 ACM-BCB 218-225.

Parameters

label_name (str) – name for new mask

References

Kothari, S., Phan, J.H., Osunkoya, A.O. and Wang, M.D., 2012, October. Biological interpretation of morphological patterns in histopathological whole-slide images. In Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (pp. 218-225).

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.LabelWhiteSpaceHE(label_name=None, greyscale_threshold=230, proportion_threshold=0.5)

Simple threshold method to label an image as majority whitespace. Converts image to greyscale. If the proportion of pixels exceeding the greyscale threshold is greater than the proportion threshold, then the image is labelled as whitespace.

Parameters

label_name (str) – name for new mask

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.SegmentMIF(model='mesmer', nuclear_channel=None, cytoplasm_channel=None, image_resolution=0.5, preprocess_kwargs=None, postprocess_kwargs_nuclear=None, postprocess_kwargs_whole_cell=None)

Transform applying segmentation to MIF images.

Input image must be formatted (c, x, y) or (batch, c, x, y). z and t dimensions must be selected before calling SegmentMIF

Supported models:

  • Mesmer: Mesmer uses human-in-the-loop pipeline to train a ResNet50 backbone w/ Feature Pyramid Network segmentation model on 1.3 million cell annotations and 1.2 million nuclear annotations (TissueNet dataset). Model outputs predictions for centroid and boundary of every nucleus and cell, then centroid and boundary predictions are used as inputs to a watershed algorithm that creates segmentation masks.

  • Cellpose: [coming soon]

Note

Mesmer model requires installation of deepcell dependency: pip install deepcell

Parameters
  • model (str) – string indicating which segmentation model to use. Currently only ‘mesmer’ is supported.

  • nuclear_channel (int) – channel that defines cell nucleus

  • cytoplasm_channel (int) – channel that defines cell membrane or cytoplasm

  • image_resolution (float) – pixel resolution of image in microns

  • preprocess_kwargs (dict) – keyword arguemnts to pass to pre-processing function

  • postprocess_kwargs_nuclear (dict) – keyword arguments to pass to post-processing function

  • postprocess_kwargs_whole_cell (dict) – keyword arguments to pass to post-processing function

References

Greenwald, N.F., Miller, G., Moen, E. et al. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nat Biotechnol (2021). https://doi.org/10.1038/s41587-021-01094-0

Stringer, C., Wang, T., Michaelos, M. and Pachitariu, M., 2021. Cellpose: a generalist algorithm for cellular segmentation. Nature Methods, 18(1), pp.100-106.

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.QuantifyMIF(segmentation_mask)

Convert segmented image into anndata.AnnData counts object AnnData. Counts objects are used to interface with the Python single cell analysis ecosystem Scanpy. The counts object contains a summary of channel statistics in each cell along with its coordinate.

Parameters

segmentation_mask (str) – key indicating which mask to use as label image

F(self, img, segmentation, coords_offset=(0, 0))

Functional implementation

Parameters
  • img (np.ndarray) – Input image of shape (i, j, n_channels)

  • segmentation (np.ndarray) – Segmentation map of shape (i, j) or (i, j, 1). Zeros are background. Regions should be labelled with unique integers.

  • coords_offset (tuple, optional) – Coordinates (i, j) used to convert tile-level coordinates to slide-level. Defaults to (0, 0) for no offset.

Returns

Counts matrix

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.CollapseRunsVectra

Coerce Vectra output to standard format. For compatibility with transforms, tiles need to have their shape collapsed to (x, y, c)

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.CollapseRunsCODEX(z)

Coerce CODEX output to standard format. CODEX format is (x, y, z, c, t) where c=4 (4 runs per cycle) and t is the number of cycles. Output format is (x, y, c) where all cycles are collapsed into c (c = 4 * # of cycles).

Parameters

z (int) – in-focus z-plane

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.RescaleIntensity(in_range='image', out_range='dtype')

Return image after stretching or shrinking its intensity levels. The desired intensity range of the input and output, in_range and out_range respectively, are used to stretch or shrink the intensity range of the input image This function is a wrapper for ‘rescale_intensity’ function from scikit-image: https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.rescale_intensity

Parameters
  • in_range (str or 2-tuple, optional) – Min and max intensity values of input image. The possible values for this parameter are enumerated below. ‘image’ : Use image min/max as the intensity range. ‘dtype’ : Use min/max of the image’s dtype as the intensity range. ‘dtype-name’ : Use intensity range based on desired dtype. Must be valid key in DTYPE_RANGE. ‘2-tuple’ : Use range_values as explicit min/max intensities.

  • out_range (str or 2-tuple, optional) – Min and max intensity values of output image. The possible values for this parameter are enumerated below. ‘image’ : Use image min/max as the intensity range. ‘dtype’ : Use min/max of the image’s dtype as the intensity range. ‘dtype-name’ : Use intensity range based on desired dtype. Must be valid key in DTYPE_RANGE. ‘2-tuple’ : Use range_values as explicit min/max intensities.

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.HistogramEqualization(nbins=256, mask=None)

Return image after histogram equalization. This function is a wrapper for ‘equalize_hist’ function from scikit-image: https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_hist

Parameters
  • nbins (int, optional) – Number of gray bins for histogram. Note: this argument is ignored for integer images, for which each integer is its own bin.

  • mask (ndarray of bools or 0s and 1s, optional) – Array of same shape as image. Only points at which mask == True are used for the equalization, which is applied to the whole image.

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place

class pathml.preprocessing.AdaptiveHistogramEqualization(kernel_size=None, clip_limit=0.3, nbins=256)

Contrast Limited Adaptive Histogram Equalization (CLAHE). An algorithm for local contrast enhancement, that uses histograms computed over different tile regions of the image. Local details can therefore be enhanced even in regions that are darker or lighter than most of the image. This function is a wrapper for ‘equalize_adapthist’ function from scikit-image: https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_adapthist

Parameters
  • kernel_size (int or array_like, optional) – Defines the shape of contextual regions used in the algorithm. If iterable is passed, it must have the same number of elements as image.ndim (without color channel). If integer, it is broadcasted to each image dimension. By default, kernel_size is 1/8 of image height by 1/8 of its width.

  • clip_limit (float) – Clipping limit, normalized between 0 and 1 (higher values give more contrast).

  • nbins (int) – Number of gray bins for histogram (“data range”).

F(self, image)

functional implementation

apply(self, tile)

modify Tile object in-place