Datasets API
PanNuke
- class pathml.datasets.PanNukeDataModule(data_dir, download=False, shuffle=True, transforms=None, nucleus_type_labels=False, split=None, batch_size=8, hovernet_preprocess=False)
DataModule for the PanNuke Dataset. Contains 256px image patches from 19 tissue types with annotations for 5 nucleus types. For more information, see: https://warwick.ac.uk/fac/sci/dcs/research/tia/data/pannuke
- Parameters:
data_dir (str) – Path to directory where PanNuke data is
download (bool, optional) – Whether to download the data. If
True
, checks whether data files exist indata_dir
and downloads them todata_dir
if not. IfFalse
, checks to make sure that data files exist indata_dir
. DefaultFalse
.shuffle (bool, optional) – Whether to shuffle images. Defaults to
True
.transforms (optional) – Data augmentation transforms to apply to images. Transform must accept two arguments: (mask and image) and return a dict with “image” and “mask” keys. See an example here: https://albumentations.ai/docs/getting_started/mask_augmentation/
nucleus_type_labels (bool, optional) –
Whether to provide nucleus type labels, or binary nucleus labels. If
True
, then masks will be returned with six channels, corresponding toNeoplastic cells
Inflammatory
Connective/Soft tissue cells
Dead Cells
Epithelial
Background
If
False
, then the returned mask will have a single channel, with zeros for background pixels and ones for nucleus pixels (i.e. the inverse of the Background mask). Defaults toFalse
.split (int, optional) –
How to divide the three folds into train, test, and validation splits. Must be one of {1, 2, 3, None} corresponding to the following splits:
Training: Fold 1; Validation: Fold 2; Testing: Fold 3
Training: Fold 2; Validation: Fold 1; Testing: Fold 3
Training: Fold 3; Validation: Fold 2; Testing: Fold 1
If
None
, then the entire PanNuke dataset will be used. Defaults toNone
.batch_size (int, optional) – batch size for dataloaders. Defaults to 8.
hovernet_preprocess (bool) – Whether to perform preprocessing specific to HoVer-Net architecture. If
True
, the center of mass of each nucleus will be computed, and an additional mask will be returned with the distance of each nuclear pixel to its center of mass in the horizontal and vertical dimensions. This corresponds to Gamma(I) from the HoVer-Net paper. Defaults toFalse
.
- References
Gamper, J., Koohbanani, N.A., Benet, K., Khuram, A. and Rajpoot, N., 2019, April. PanNuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification. In European Congress on Digital Pathology (pp. 11-19). Springer, Cham.
Gamper, J., Koohbanani, N.A., Graham, S., Jahanifar, M., Khurram, S.A., Azam, A., Hewitt, K. and Rajpoot, N., 2020. PanNuke Dataset Extension, Insights and Baselines. arXiv preprint arXiv:2003.10778.
- property test_dataloader
Dataloader for test set. Yields (image, mask, tissue_type), or (image, mask, hv, tissue_type) for HoVer-Net
- property train_dataloader
Dataloader for training set. Yields (image, mask, tissue_type), or (image, mask, hv, tissue_type) for HoVer-Net
- property valid_dataloader
Dataloader for validation set. Yields (image, mask, tissue_type), or (image, mask, hv, tissue_type) for HoVer-Net
DeepFocus
- class pathml.datasets.DeepFocusDataModule(data_dir, download=False, shuffle=True, transforms=None, batch_size=8)
DataModule for the DeepFocus dataset. The DeepFocus dataset comprises four slides from different patients, each with four different stains (H&E, Ki67, CD21, and CD10) for a total of 16 whole-slide images. For each slide, a region of interest (ROI) of approx 6mm^2 was scanned at 40x magnification with an Aperio ScanScope on nine different focal planes, generating 216,000 samples with varying amounts of blurriness. Tiles with offset values between [-0.5μm, 0.5μm] are labeled as in-focus and the rest of the images are labeled as blurry.
See: https://github.com/cialab/DeepFocus
- Parameters:
data_dir (str) – file path to directory containing data.
download (bool, optional) – Whether to download the data. If
True
, checks whether data files exist indata_dir
and downloads them todata_dir
if not. IfFalse
, checks to make sure that data files exist indata_dir
. DefaultFalse
.shuffle (bool, optional) – Whether to shuffle images. Defaults to
True
.transforms (optional) – Data augmentation transforms to apply to images.
batch_size (int, optional) – batch size for dataloaders. Defaults to 8.
- Reference:
Senaras, C., Niazi, M.K.K., Lozanski, G. and Gurcan, M.N., 2018. DeepFocus: detection of out-of-focus regions in whole slide digital images using deep learning. PloS one, 13(10), p.e0205387.
- property test_dataloader
- property train_dataloader
- property valid_dataloader