DataLoaders

After running a preprocessing pipeline and writing the resulting .h5path file to disk, the next step is to create a DataLoader for feeding tiles into a machine learning model in PyTorch.

To do this, use the TileDataset class and then wrap it in a PyTorch DataLoader:

dataset = TileDataset("/path/to/file.h5path")
dataloader = torch.utils.data.DataLoader(dataset, batch_size = 16, shuffle = True, num_workers = 4)

Note

Label dictionaries are not standardized, as users are free to store whatever labels they want. For that reason, PyTorch cannot automatically stack labels into batches. It may therefore be necessary to create a custom collate_fn to specify how to create batches of labels. See here.

This provides an interface between PathML and the broader ecosystem of machine learning tools built on PyTorch. For more information on how to use Datasets and DataLoaders, please see the PyTorch documentation and tutorials.