Examples

CuBATS processes a single HE-stained WSI, used for tumor segmentation, along with a variable number of IHC-stained WSIs, each stained for a tumor-associated antigen (TAA). All downstream analyses—including registration, quantification, and combinatorial co-expression analysis—are performed on these WSIs.

The pipeline leverages multiprocessing to speed up computation, and supports both CPU and GPU execution. Hardware availability is automatically detected, with seamless fallback to CPU if no compatible GPU is found. GPU-based processing uses CuPy, whereas CPU-based processing relies on standard NumPy operations. In both cases, computations build upon tile-based parallelization and vectorization, providing substantial performance gains for computationally intensive steps such as quantification and multi-antigen analysis.

The following examples provide a complete walkthrough of CuBATS and serve as a guide to get started. They cover the main steps of the pipeline, starting with image registration and tumor segmentation, followed by quantification and combinatorial analysis of antigen expressions.

An executable version of the these examples can be found as a Jupyter notebook in the CuBATS/examples directory.

Note

In this documentation, the terms slide and whole-slide image (WSI) are used interchangeably. For clarity, “WSI” is preferred when referring to the digital image data processed by CuBATS.

Important

Depending on the size of your data, ensure your system has sufficient disc space and RAM available for processing. We recommend a minimum of 32 GB of RAM, or the use of a dedicated server for larger datasets.

Initialize SlideCollection

CuBATS organizes and processes WSIs in the SlideCollection. Once initialized, the collection provides built-in methods for registration, tumor segmentation, quantification, and combinatorial antigen analysis.

Example 1 demonstrates the initialization of a SlideCollection. A SlideCollection can be initialized with the following arguments: collection_name (e.g., tumor set or patient ID), a source directory (src_dir) containing the WSIs, a destination directory (dest_dir) for results, an optional reference WSI (ref_slide), and an optional path to antigen threshold profiles (path_antigen_profiles). If ref_slide is omitted, CuBATS automatically selects an HE WSI based on filenames. If no path_antigen_profiles is provided, default thresholds are used.

Note

If the specified dest_dir already contains previous processing results, these are automatically reloaded when initializing the collection, allowing to resume analysis without reprocessing completed steps. Reprocessing will overwrite existing results, so back up any important data before re-running analyses.

Example 1: Initialize SlideCollection

from cubats.slide_collection.slide_collection import SlideCollection

my_collection = SlideCollection(
    collection_name  "Tumor_Set_01",
    src_dir = "/path/to/wsi_files",
    dest_dir = "/path/to/output_dir",
    ref_slide = "/path/to/reference_wsi.tiff",
    path_antigen_profiles = "/path/to/threshold_profiles.json"
)

Image Registration

Important

Image registration and alignment is performed using the VALIS framework. Please ensure all dependencies are correctly installed. For details, refer to the CuBATS Installation section or the VALIS documentation.

CuBATS provides a wrapper class that predefines registration parameters for convenience. For more customized registration or if the registration results are unsatisfactory, check out the VALIS documentation for parameter adjustments.

This example demonstrates registering a collection of WSIs and aligning them. Registration can be performed towards a selected reference WSI or automatically towards a WSI chosen by VALIS. While reference-based registration may be beneficial in some cases, we have observed that automatic registration often produced more accurate results for larger datasets during development.

By default, registration includes a rigid registration followed by a non-rigid registration. Optionally, high-resolution micro-registration can be enabled via microregistration = True, which is recommended for large high-resolution WSIs.

  • Example 2 shows registration with a reference WSI.

  • Example 3 shows registration without a reference WSI.

WSIs must be located in SlideCollection.src_dir. Registered WSIs will be saved to SlideCollection.registration_dir. The parameter max_non_rigid_registration_dim_px defaults to 2000 for high-resolution registration but can be adjusted, as in Example 2. Cropping can be specified via crop with options “overlap”, “reference”, or None. Micro-registration is be enabled via microregistration=True.

Example 2: Image Registration With Reference WSI

my_collection.register_slides(
    reference_slide = "path/to/reference_wsi.tiff",
    microregistration = True,
    crop = "reference"
)

Example 3: Image Registration Without Reference

 my_collection.register_slides(
    microregistration = True,
    max_non_rigid_registration_dim_px = 1800,
    crop = "overlap"
)

Note

Additional registration outputs and intermediate files are stored inside the intermediate_registration_results directory. For more details on subdirectory contents or advanced registration options, see the VALIS documentation.

Tumor Segmentation

Important

The accuracy of tumor segmentation depends on the chosen model and its training. Segmentation parameters such as tile_size, normalization, or inversion may require adjustment. If results are unsatisfactory, modify the parameters or verify the model quality.

Example 4 demonstrates running tumor segmentation using my_collection.tumor_segmentation. The function applies a segmentation model to the HE-stained WSI and produces a binary tumor mask (.TIFF). An HE-stained WSI is required for this step, as it better captures morphological tissue structures and tumor boundaries than antigen-specific stains such as IHC. The input can be a single WSI file or a directory containing multiple HE-stained WSIs.

The segmentation model must be provided as an .ONNX file via model_path. The tile_size parameter is specified as a tuple (e.g., (1024, 1024)). Optional parameters include output_path (default is SlideCollection.registration_dir), normalization (Reinhard normalization), inversion (invert mask), and plot_results (creates a thumbnail overlay of the tumor mask on the WSI).

The function automatically resizes input tiles to match the model’s expected input size and scales the output mask back to the original tile size, after segmentation.

Example 4: Tumor Segmentation

my_collection.tumor_segmentation(
    model_path = "path/to/model.onnx",
    reference_slide = "path/to/he_wsi.tiff",
    tile_size = (512, 512),
    output_path = None,
    normalization = False,
    inversion = False,
    plot_results = True
)

WSI Quantification

Quantification in CuBATS is performed on the previously registered IHC-stained WSIs using their extracted antigen-specific DAB stain channel. The DAB channel is separated from the hematoxylin channel using color deconvolution [1], allowing accurate measurement of antigen staining. Each WSI is divided into non-overlapping tiles of 1024×1024 pixels, which are quantified individually in a pixel-wise manner. For each tile, the staining intensities are measured across tumor regions, and the results are ultimately aggregated for the entire WSI. This step implements a variation of the IHC-Profiler algorithm [2], producing output metrics such as tumor coverage, stratification of coverage into high, medium-, and low-positive expression, negative tissue, background, H-score, and an additional IHC-Profiler score.

CuBATS allows quantification of either all IHC WSIs in the SlideCollection or a single slide individually. Results are automatically stored as .CSV and .PICKLE files in the specified data_dir.

Quantification Modes

Quantification can be run in two mask application modes:

  • "tile-level" (default): Applies the tumor mask coarsely — tiles overlapping the mask are fully included. Recommended when registration accuracy is moderate.

  • "pixel-level": Applies the tumor mask at pixel precision — only masked pixels are included. Offers higher accuracy but is more sensitive to registration noise.

CuBATS also supports antigen-specific threshold profiles, allowing for fine-tuned quantification across different antibodies or staining intensities. Thresholds can be supplied as a .JSON or .CSV file via the threshold_profile_path parameter when initializing the SlideCollection. If omitted, default thresholds are used.

CuBATS also offers post-quantification reconstruction of tiles into a DAB WSI. In order to do this, saving of tile images must be enabled via save_imgs=True in the quantification functions.

Quantify All IHC WSIs in SlideCollection

Example 5 demonstrates how to quantify all IHC WSIs within a predefined SlideCollection. All WSIs (except the reference and mask slides) are processed sequentially, and results are saved to the output directory. This example applies “tile-level” masking without saving tile images.

Example 5: Quantify All WSIs

from cubats.slide_collection.slide_collection import SlideCollection

# Quantify all slides in the collection
my_collection.quantify_all_slides(
    save_imgs = False,
    masking_mode = "tile-level"
)

After execution, the results are available in my_collection.quantification_results and stored as tile-level_quantification_results.csv inside the data_dir.

Quantify a Single Slide

Example 6 shows how to quantify a single slide within a predefined SlideCollection. This is useful for re-quantifying a specific slide with different parameters. This example applies “pixel-level” masking and saves tile images for potential reconstruction.

Example 6: Quantify a Single Slide

from cubats.slide_collection.slide_collection import SlideCollection

# Quantify a single slide by name
my_collection.quantify_single_slide(
    slide_name = "Slide_01",
    save_img = True,
    masking_mode = "pixel-level"
)

Note

Quantification overwrites existing results in data_dir. Back up previous data before re-running quantification on the same collection.

Combinatorial Analysis of Antigen Expressions

CuBATS enables spatial co-expression analysis of TAAs by performing pixel-wise comparisons across the previously quantified IHC-stained WSIs. For each antigen pair or triplet, CuBATS computes combined tumor coverage, identifying regions of overlapping expression and complementary expression between markers as well as stratification into high-, medium-, and low-positive categories. This analysis builds upon the results generated during quantification, using the same antigen-specific intensity thresholds.

Results are stored in .CSV and .PICKLE format within the data_dir. Optionally, a visualization of the combinatorial analysis can be saved for reconstruction into a WSIs. This can be done by enabling save_imgs=True.

Furthermore the selection of tile-level or pixel-level masking can be specified via the masking_mode parameter. This parameter should be set consistent with the selection used during quantification.

Pairwise-Antigen Combinations

Example 7 demonstrates how to compute all possible antigen pairs within a predefined SlideCollection. Each pair of quantified slides is compared tile by tile, with results aggregated to a WSI-level. The example applies “pixel-level” masking with saving tile images.

Example 7: Pairwise-Antigen Co-Expression Analysis

from cubats.slide_collection.slide_collection import SlideCollection

# Compute all possible antigen pairs within the collection
my_collection.generate_antigen_pair_combinations(
    save_imgs = True,
    masking_mode = "pixel-level"
)

After completion, the dual-antigen results are available in my_collection.dual_antigen_expressions and saved in data_dir/pixel-level_dual_antigen_expressions.csv.

Note

Co-expression analysis relies on previously quantified slides. Ensure quantification has been completed before running antigen combination analysis. Triplet combinations can be computed analogously using my_collection.generate_antigen_triplet_combinations(masking_mode="pixel-level").

References