Multiparametric Imaging: Quickstart
Pathology imaging experiments commonly produce data where each channel corresponds to a molecular feature, such as the expression level of a protein or nucleic acid. PathML implements MultiparametricSlide, a subclass of SlideData for which we implement special transforms (for more information about transforms, see “Creating Preprocessing Pipelines” in our documentation).
MultiparametricSlide is the appropriate type to analyze low-dimensional techniques including immunofluoresence (protein and in situ hybridization/RNAscope). Recently multiple approaches to higher dimensional imaging of ‘spatial omics’, the simultaneous measurement of a large number of molecular features, have emerged (see https://www.nature.com/articles/s41592-020-01033-y, https://pubmed.ncbi.nlm.nih.gov/30078711/, among many others).
In this notebook we run a pipeline to analyze CODEX data to demonstrate PathML’s support for multiparametric imaging data.
We use the MultiparametricSlide subclass, CODEXSlide, which supports special preprocessing transformations for the CODEX technique. See “Convenience SlideData Classes” (https://pathml.readthedocs.io/en/latest/api_core_reference.html#convenience-slidedata-classes) to see other subclasses that we have implemented.
[ ]:
# load libraries and data
from pathml.core.slide_data import CODEXSlide
from pathml.preprocessing.pipeline import Pipeline
from pathml.preprocessing.transforms import SegmentMIF, QuantifyMIF, CollapseRunsCODEX
import numpy as np
import matplotlib.pyplot as plt
from dask.distributed import Client
from deepcell.utils.plot_utils import make_outline_overlay
from deepcell.utils.plot_utils import create_rgb_image
import scanpy as sc
import squidpy as sq
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
slidedata = CODEXSlide('/home/ryc4001/Documents/pathmlproj/nolan_codex/reg031_X01_Y01.tif');
Here we analyze a TMA from Schurch et al., Coordinated Cellular Neighborhoods Orchestrate Antitumoral Immunity at the Colorectal Cancer Invasive Front (Cell, 2020)
Below are the proteins measured in this TMA and the cell types they label. CODEX images proteins in cycles of 3, so here we list proteins by cycle.
Cycle |
Protein 1 |
Protein 2 |
Protein 3 |
---|---|---|---|
HOECHST1 |
blank |
blank |
blank |
HOECHST2 |
CD44 - stroma |
FOXP3 - regulatory T cells |
CDX2 - intestinal epithelia |
HOECHST3 |
CD8 - cytotoxic T cells |
p53 - tumor suppressor |
GATA3 - Th2 helper T cells |
HOECHST4 |
CD45 - hematopoietic cells |
T-bet - Th1 cells |
beta-catenin - Wnt signaling |
HOECHST5 |
HLA-DR - MHC-II |
PD-L1 - checkpoint |
Ki67 - proliferation |
HOECHST6 |
CD45RA - naive T cells |
CD4 - T helper cells |
CD21 - DCs |
HOECHST7 |
MUC-1 - epithelia |
CD30 - costimulator |
CD2 - T cells |
HOECHST8 |
Vimentin - cytoplasm |
CD20 - B cells |
LAG-3 - checkpoint |
HOECHST9 |
Na-K-ATPase - membranes |
CD5 - T cells |
IDO-1 - metabolism |
HOECHST10 |
Cytokeratin - epithelia |
CD11b - macrophages |
CD56 - NK cells |
HOECHST11 |
aSMA - smooth muscle |
BCL-2 - apoptosis |
CD25 - IL-2 Ra |
HOECHST12 |
Collagen IV - bas. memb. |
CD11c - DCs |
PD-1 - checkpoint |
HOCHST13 |
Granzyme B - cytotoxicity |
EGFR - signaling |
VISTA - costimulator |
HOECHST14 |
CD15 - granulocytes |
CD194 - CCR4 chemokine R |
ICOS - costimulator |
HOECHST15 |
MMP9 - matrix metalloproteinase |
Synaptophysin - neuroendocrine |
CD71 - transferrin R |
HOECHST16 |
GFAP - nerves |
CD7 - T cells |
CD3 - T cells |
HOECHST17 |
Chromogranin A - neuroendocrine |
CD163 - macrophages |
CD57 - NK cells |
HOECHST18 |
empty - A488_18 |
CD45RO - memory cells |
CD68 - macrophages |
HOECHST19 |
empty - A488_19 |
CD31 - vasculature |
Podoplanin - lymphatics |
HOECHST20 |
empty-A488-20 |
CD34 - vasculature |
CD38 - multifunctional |
HOECHST21 |
empty-A488-21 |
CD138 - plasma cells |
MMP12 - matrix metalloproteinase |
HOECHST22 |
empty-A488-22 |
empty-Cy3-22 |
empty-Cy5-22 |
HOECHST23 |
empty-A488-23 |
empty-Cy3-23 |
DRAQ5 |
[2]:
# These tif are of the form (x,y,z,c,t) but t is being used to denote cycles
# 17 z-slices, 4 channels per 23 cycles, 70 regions
slidedata.slide.shape
[2]:
(1920, 1440, 17, 4, 23)
Defining a Multiparametric Pipeline
We define a pipeline that chooses a z-slice from our CODEX image, segments cells in the image, then quantifies the expression of each protein in each cell.
[ ]:
# 31 -> Na-K-ATPase
pipe = Pipeline([
CollapseRunsCODEX(z=6),
SegmentMIF(model='mesmer', nuclear_channel=0, cytoplasm_channel=31, image_resolution=0.377442),
QuantifyMIF(segmentation_mask='cell_segmentation')
])
client = Client()
slidedata.run(pipe, distributed = False, client = client, tile_size=1000, tile_pad=False);
[4]:
img = slidedata.tiles[0].image
[5]:
def plot(slidedata, tile, channel1, channel2):
image = np.expand_dims(slidedata.tiles[tile].image, axis=0)
nuc_segmentation_predictions = np.expand_dims(slidedata.tiles[tile].masks['nuclear_segmentation'], axis=0)
cell_segmentation_predictions = np.expand_dims(slidedata.tiles[tile].masks['cell_segmentation'], axis=0)
#nuc_cytoplasm = np.expand_dims(np.concatenate((image[:,:,:,channel1,0], image[:,:,:,channel2,0]), axis=2), axis=0)
nuc_cytoplasm = np.stack((image[:,:,:,channel1], image[:,:,:,channel2]), axis=-1)
rgb_images = create_rgb_image(nuc_cytoplasm, channel_colors=['blue', 'green'])
overlay_nuc = make_outline_overlay(rgb_data=rgb_images, predictions=nuc_segmentation_predictions)
overlay_cell = make_outline_overlay(rgb_data=rgb_images, predictions=cell_segmentation_predictions)
fig, ax = plt.subplots(1, 2, figsize=(15, 15))
ax[0].imshow(rgb_images[0, ...])
ax[1].imshow(overlay_cell[0, ...])
ax[0].set_title('Raw data')
ax[1].set_title('Cell Predictions')
plt.show()
Let’s check the quality of our segmentations in a 1000x1000 pixel tile by looking at DAPI, Syp, and CD44.
[6]:
# DAPI + Syp
plot(slidedata, tile=0, channel1=0, channel2=60)
# DAPI + CD44
plot(slidedata, tile=0, channel1=0, channel2=24)


AnnData Integration and Spatial Single Cell Analysis
Now let’s explore the single-cell quantification of our imaging data. Our pipeline produced a single-cell matrix of shape (cell x protein) where each cell has attached additional information including location on the slide and the size of the cell in the image. This information is stored in slidedata.counts as an anndata object (https://anndata.readthedocs.io/en/latest/anndata.AnnData.html).
[7]:
adata = slidedata.counts.to_memory()
[8]:
adata
[8]:
AnnData object with n_obs × n_vars = 1102 × 92
obs: 'x', 'y', 'coords', 'filled_area', 'slice', 'euler_number', 'tile'
obsm: 'spatial'
layers: 'min_intensity', 'max_intensity'
[9]:
adata.X
[9]:
array([[ 85.2459 , 119.18852 , 137.40984 , ..., 18.467213 ,
1.8032787 , 133.16394 ],
[ 99.308334 , 111.98333 , 122.60833 , ..., 14.7 ,
1.7583333 , 126.225 ],
[ 41.63498 , 139.92775 , 132.1711 , ..., 14.707224 ,
4.5095057 , 131.92015 ],
...,
[ 28.657894 , 139.27193 , 133.86842 , ..., 21.307018 ,
4.2017546 , 125.20175 ],
[114.90351 , 123.552635 , 119.76316 , ..., 13.078947 ,
0.98245615, 117.02631 ],
[ 72.8951 , 126.034966 , 130.91608 , ..., 20.86014 ,
0.5244755 , 126.97203 ]], dtype=float32)
[10]:
adata.obs
[10]:
x | y | coords | filled_area | slice | euler_number | tile | |
---|---|---|---|---|---|---|---|
0 | 2.770492 | 452.377049 | [array([[ 0, 444],\n [ 0, 445],\n ... | 122 | [(slice(0, 7, None), slice(444, 464, None))\n ... | 1 | (0, 0) |
1 | 4.050000 | 212.316667 | [array([[ 0, 444],\n [ 0, 445],\n ... | 120 | [(slice(0, 7, None), slice(444, 464, None))\n ... | 1 | (0, 0) |
2 | 5.813688 | 273.250951 | [array([[ 0, 444],\n [ 0, 445],\n ... | 263 | [(slice(0, 7, None), slice(444, 464, None))\n ... | 1 | (0, 0) |
3 | 3.914141 | 351.075758 | [array([[ 0, 444],\n [ 0, 445],\n ... | 198 | [(slice(0, 7, None), slice(444, 464, None))\n ... | 1 | (0, 0) |
4 | 4.190045 | 959.393665 | [array([[ 0, 444],\n [ 0, 445],\n ... | 221 | [(slice(0, 7, None), slice(444, 464, None))\n ... | 1 | (0, 0) |
... | ... | ... | ... | ... | ... | ... | ... |
1097 | 995.254902 | 597.872549 | [array([[ 0, 444],\n [ 0, 445],\n ... | 102 | [(slice(0, 7, None), slice(444, 464, None))\n ... | 1 | (0, 0) |
1098 | 996.529412 | 108.694118 | [array([[ 0, 444],\n [ 0, 445],\n ... | 85 | [(slice(0, 7, None), slice(444, 464, None))\n ... | 1 | (0, 0) |
1099 | 995.561404 | 250.298246 | [array([[ 0, 444],\n [ 0, 445],\n ... | 114 | [(slice(0, 7, None), slice(444, 464, None))\n ... | 1 | (0, 0) |
1100 | 995.789474 | 610.833333 | [array([[ 0, 444],\n [ 0, 445],\n ... | 114 | [(slice(0, 7, None), slice(444, 464, None))\n ... | 1 | (0, 0) |
1101 | 995.188811 | 730.727273 | [array([[ 0, 444],\n [ 0, 445],\n ... | 143 | [(slice(0, 7, None), slice(444, 464, None))\n ... | 1 | (0, 0) |
1102 rows × 7 columns
[11]:
adata.var
[11]:
0 |
---|
1 |
2 |
3 |
4 |
... |
87 |
88 |
89 |
90 |
91 |
92 rows × 0 columns
This anndata object gives us access to the entire python (or Seurat) single cell analysis ecosystem of tools. We follow a single cell analysis workflow described in https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html and https://www.embopress.org/doi/full/10.15252/msb.20188746.
[12]:
import scanpy as sc
sc.pl.violin(adata, keys = ['0','24','60'])
sc.pp.log1p(adata)
sc.pp.scale(adata, max_value=10)
sc.tl.pca(adata, svd_solver='arpack')
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=10)
sc.tl.umap(adata)
sc.pl.umap(adata, color=['0','24','60'])


[13]:
sc.tl.leiden(adata, resolution = 0.15)
sc.pl.umap(adata, color='leiden')
sc.tl.rank_genes_groups(adata, 'leiden', method='t-test')
sc.pl.rank_genes_groups_dotplot(adata, groupby='leiden', vmax=5, n_genes=5)

WARNING: dendrogram data not found (using key=dendrogram_leiden). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.

We can also use spatial analysis tools such as https://github.com/theislab/squidpy.
[14]:
import scanpy as sc
import squidpy as sq
sc.pl.spatial(adata, color='leiden', spot_size=15)
sc.pl.spatial(
adata,
color="leiden",
groups=[
"2",
"4"
],
spot_size=15
)


[15]:
sq.gr.co_occurrence(adata, cluster_key="leiden")
sq.pl.co_occurrence(
adata,
cluster_key="leiden"
)

References
Schürch, C.M., Bhate, S.S., Barlow, G.L., Phillips, D.J., Noti, L., Zlobec, I., Chu, P., Black, S., Demeter, J., McIlwain, D.R. and Samusik, N., 2020. Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front. Cell, 182(5), pp.1341-1359.