Multiparametric Imaging: Quickstart

View on GitHub

Pathology imaging experiments commonly produce data where each channel corresponds to a molecular feature, such as the expression level of a protein or nucleic acid. PathML implements MultiparametricSlide, a subclass of SlideData for which we implement special transforms (for more information about transforms, see “Creating Preprocessing Pipelines” in our documentation).

MultiparametricSlide is the appropriate type to analyze low-dimensional techniques including immunofluoresence (protein and in situ hybridization/RNAscope). Recently multiple approaches to higher dimensional imaging of ‘spatial omics’, the simultaneous measurement of a large number of molecular features, have emerged (see https://www.nature.com/articles/s41592-020-01033-y, https://pubmed.ncbi.nlm.nih.gov/30078711/, among many others).

In this notebook we run a pipeline to analyze CODEX data to demonstrate PathML’s support for multiparametric imaging data.

We use the MultiparametricSlide subclass, CODEXSlide, which supports special preprocessing transformations for the CODEX technique. See “Convenience SlideData Classes” (https://pathml.readthedocs.io/en/latest/api_core_reference.html#convenience-slidedata-classes) to see other subclasses that we have implemented.

[ ]:
# load libraries and data
from pathml.core.slide_data import CODEXSlide
from pathml.preprocessing.pipeline import Pipeline
from pathml.preprocessing.transforms import SegmentMIF, QuantifyMIF, CollapseRunsCODEX

import numpy as np
import matplotlib.pyplot as plt
from dask.distributed import Client
from deepcell.utils.plot_utils import make_outline_overlay
from deepcell.utils.plot_utils import create_rgb_image

import scanpy as sc
import squidpy as sq

import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

slidedata = CODEXSlide('/home/ryc4001/Documents/pathmlproj/nolan_codex/reg031_X01_Y01.tif');

Here we analyze a TMA from Schurch et al., Coordinated Cellular Neighborhoods Orchestrate Antitumoral Immunity at the Colorectal Cancer Invasive Front (Cell, 2020)

Below are the proteins measured in this TMA and the cell types they label. CODEX images proteins in cycles of 3, so here we list proteins by cycle.

Cycle

Protein 1

Protein 2

Protein 3

HOECHST1

blank

blank

blank

HOECHST2

CD44 - stroma

FOXP3 - regulatory T cells

CDX2 - intestinal epithelia

HOECHST3

CD8 - cytotoxic T cells

p53 - tumor suppressor

GATA3 - Th2 helper T cells

HOECHST4

CD45 - hematopoietic cells

T-bet - Th1 cells

beta-catenin - Wnt signaling

HOECHST5

HLA-DR - MHC-II

PD-L1 - checkpoint

Ki67 - proliferation

HOECHST6

CD45RA - naive T cells

CD4 - T helper cells

CD21 - DCs

HOECHST7

MUC-1 - epithelia

CD30 - costimulator

CD2 - T cells

HOECHST8

Vimentin - cytoplasm

CD20 - B cells

LAG-3 - checkpoint

HOECHST9

Na-K-ATPase - membranes

CD5 - T cells

IDO-1 - metabolism

HOECHST10

Cytokeratin - epithelia

CD11b - macrophages

CD56 - NK cells

HOECHST11

aSMA - smooth muscle

BCL-2 - apoptosis

CD25 - IL-2 Ra

HOECHST12

Collagen IV - bas. memb.

CD11c - DCs

PD-1 - checkpoint

HOCHST13

Granzyme B - cytotoxicity

EGFR - signaling

VISTA - costimulator

HOECHST14

CD15 - granulocytes

CD194 - CCR4 chemokine R

ICOS - costimulator

HOECHST15

MMP9 - matrix metalloproteinase

Synaptophysin - neuroendocrine

CD71 - transferrin R

HOECHST16

GFAP - nerves

CD7 - T cells

CD3 - T cells

HOECHST17

Chromogranin A - neuroendocrine

CD163 - macrophages

CD57 - NK cells

HOECHST18

empty - A488_18

CD45RO - memory cells

CD68 - macrophages

HOECHST19

empty - A488_19

CD31 - vasculature

Podoplanin - lymphatics

HOECHST20

empty-A488-20

CD34 - vasculature

CD38 - multifunctional

HOECHST21

empty-A488-21

CD138 - plasma cells

MMP12 - matrix metalloproteinase

HOECHST22

empty-A488-22

empty-Cy3-22

empty-Cy5-22

HOECHST23

empty-A488-23

empty-Cy3-23

DRAQ5

[2]:
# These tif are of the form (x,y,z,c,t) but t is being used to denote cycles
# 17 z-slices, 4 channels per 23 cycles, 70 regions
slidedata.slide.shape
[2]:
(1920, 1440, 17, 4, 23)

Defining a Multiparametric Pipeline

We define a pipeline that chooses a z-slice from our CODEX image, segments cells in the image, then quantifies the expression of each protein in each cell.

[ ]:
# 31 -> Na-K-ATPase
pipe = Pipeline([
    CollapseRunsCODEX(z=6),
    SegmentMIF(model='mesmer', nuclear_channel=0, cytoplasm_channel=31, image_resolution=0.377442),
    QuantifyMIF(segmentation_mask='cell_segmentation')
])
client = Client()
slidedata.run(pipe, distributed = False, client = client, tile_size=1000, tile_pad=False);
[4]:
img = slidedata.tiles[0].image
[5]:
def plot(slidedata, tile, channel1, channel2):
    image = np.expand_dims(slidedata.tiles[tile].image, axis=0)
    nuc_segmentation_predictions = np.expand_dims(slidedata.tiles[tile].masks['nuclear_segmentation'], axis=0)
    cell_segmentation_predictions = np.expand_dims(slidedata.tiles[tile].masks['cell_segmentation'], axis=0)
    #nuc_cytoplasm = np.expand_dims(np.concatenate((image[:,:,:,channel1,0], image[:,:,:,channel2,0]), axis=2), axis=0)
    nuc_cytoplasm = np.stack((image[:,:,:,channel1], image[:,:,:,channel2]), axis=-1)
    rgb_images = create_rgb_image(nuc_cytoplasm, channel_colors=['blue', 'green'])
    overlay_nuc = make_outline_overlay(rgb_data=rgb_images, predictions=nuc_segmentation_predictions)
    overlay_cell = make_outline_overlay(rgb_data=rgb_images, predictions=cell_segmentation_predictions)
    fig, ax = plt.subplots(1, 2, figsize=(15, 15))
    ax[0].imshow(rgb_images[0, ...])
    ax[1].imshow(overlay_cell[0, ...])
    ax[0].set_title('Raw data')
    ax[1].set_title('Cell Predictions')
    plt.show()

Let’s check the quality of our segmentations in a 1000x1000 pixel tile by looking at DAPI, Syp, and CD44.

[6]:
# DAPI + Syp
plot(slidedata, tile=0, channel1=0, channel2=60)

# DAPI + CD44
plot(slidedata, tile=0, channel1=0, channel2=24)
../_images/examples_link_multiplex_if_10_0.png
../_images/examples_link_multiplex_if_10_1.png

AnnData Integration and Spatial Single Cell Analysis

Now let’s explore the single-cell quantification of our imaging data. Our pipeline produced a single-cell matrix of shape (cell x protein) where each cell has attached additional information including location on the slide and the size of the cell in the image. This information is stored in slidedata.counts as an anndata object (https://anndata.readthedocs.io/en/latest/anndata.AnnData.html).

[7]:
adata = slidedata.counts.to_memory()
[8]:
adata
[8]:
AnnData object with n_obs × n_vars = 1102 × 92
    obs: 'x', 'y', 'coords', 'filled_area', 'slice', 'euler_number', 'tile'
    obsm: 'spatial'
    layers: 'min_intensity', 'max_intensity'
[9]:
adata.X
[9]:
array([[ 85.2459    , 119.18852   , 137.40984   , ...,  18.467213  ,
          1.8032787 , 133.16394   ],
       [ 99.308334  , 111.98333   , 122.60833   , ...,  14.7       ,
          1.7583333 , 126.225     ],
       [ 41.63498   , 139.92775   , 132.1711    , ...,  14.707224  ,
          4.5095057 , 131.92015   ],
       ...,
       [ 28.657894  , 139.27193   , 133.86842   , ...,  21.307018  ,
          4.2017546 , 125.20175   ],
       [114.90351   , 123.552635  , 119.76316   , ...,  13.078947  ,
          0.98245615, 117.02631   ],
       [ 72.8951    , 126.034966  , 130.91608   , ...,  20.86014   ,
          0.5244755 , 126.97203   ]], dtype=float32)
[10]:
adata.obs
[10]:
x y coords filled_area slice euler_number tile
0 2.770492 452.377049 [array([[ 0, 444],\n [ 0, 445],\n ... 122 [(slice(0, 7, None), slice(444, 464, None))\n ... 1 (0, 0)
1 4.050000 212.316667 [array([[ 0, 444],\n [ 0, 445],\n ... 120 [(slice(0, 7, None), slice(444, 464, None))\n ... 1 (0, 0)
2 5.813688 273.250951 [array([[ 0, 444],\n [ 0, 445],\n ... 263 [(slice(0, 7, None), slice(444, 464, None))\n ... 1 (0, 0)
3 3.914141 351.075758 [array([[ 0, 444],\n [ 0, 445],\n ... 198 [(slice(0, 7, None), slice(444, 464, None))\n ... 1 (0, 0)
4 4.190045 959.393665 [array([[ 0, 444],\n [ 0, 445],\n ... 221 [(slice(0, 7, None), slice(444, 464, None))\n ... 1 (0, 0)
... ... ... ... ... ... ... ...
1097 995.254902 597.872549 [array([[ 0, 444],\n [ 0, 445],\n ... 102 [(slice(0, 7, None), slice(444, 464, None))\n ... 1 (0, 0)
1098 996.529412 108.694118 [array([[ 0, 444],\n [ 0, 445],\n ... 85 [(slice(0, 7, None), slice(444, 464, None))\n ... 1 (0, 0)
1099 995.561404 250.298246 [array([[ 0, 444],\n [ 0, 445],\n ... 114 [(slice(0, 7, None), slice(444, 464, None))\n ... 1 (0, 0)
1100 995.789474 610.833333 [array([[ 0, 444],\n [ 0, 445],\n ... 114 [(slice(0, 7, None), slice(444, 464, None))\n ... 1 (0, 0)
1101 995.188811 730.727273 [array([[ 0, 444],\n [ 0, 445],\n ... 143 [(slice(0, 7, None), slice(444, 464, None))\n ... 1 (0, 0)

1102 rows × 7 columns

[11]:
adata.var
[11]:
0
1
2
3
4
...
87
88
89
90
91

92 rows × 0 columns

This anndata object gives us access to the entire python (or Seurat) single cell analysis ecosystem of tools. We follow a single cell analysis workflow described in https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html and https://www.embopress.org/doi/full/10.15252/msb.20188746.

[12]:
import scanpy as sc
sc.pl.violin(adata, keys = ['0','24','60'])
sc.pp.log1p(adata)
sc.pp.scale(adata, max_value=10)
sc.tl.pca(adata, svd_solver='arpack')
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=10)
sc.tl.umap(adata)
sc.pl.umap(adata, color=['0','24','60'])
../_images/examples_link_multiplex_if_18_0.png
../_images/examples_link_multiplex_if_18_1.png
[13]:
sc.tl.leiden(adata, resolution = 0.15)
sc.pl.umap(adata, color='leiden')
sc.tl.rank_genes_groups(adata, 'leiden', method='t-test')
sc.pl.rank_genes_groups_dotplot(adata, groupby='leiden', vmax=5, n_genes=5)
../_images/examples_link_multiplex_if_19_0.png
WARNING: dendrogram data not found (using key=dendrogram_leiden). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
../_images/examples_link_multiplex_if_19_2.png

We can also use spatial analysis tools such as https://github.com/theislab/squidpy.

[14]:
import scanpy as sc
import squidpy as sq
sc.pl.spatial(adata, color='leiden', spot_size=15)
sc.pl.spatial(
    adata,
    color="leiden",
    groups=[
        "2",
        "4"
    ],
    spot_size=15
)
../_images/examples_link_multiplex_if_21_0.png
../_images/examples_link_multiplex_if_21_1.png
[15]:
sq.gr.co_occurrence(adata, cluster_key="leiden")
sq.pl.co_occurrence(
    adata,
    cluster_key="leiden"
)
../_images/examples_link_multiplex_if_22_1.png

References

  • Schürch, C.M., Bhate, S.S., Barlow, G.L., Phillips, D.J., Noti, L., Zlobec, I., Chu, P., Black, S., Demeter, J., McIlwain, D.R. and Samusik, N., 2020. Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front. Cell, 182(5), pp.1341-1359.