StainX
Overview
StainX is an enhanced stain normalization library for histopathology images that provides significant performance improvements through optimized batch processing. Unlike other frameworks that process images individually, StainX is designed from the ground up to handle batches of images efficiently, resulting in better GPU utilization and faster processing times.
Key Advantages
- Batch Processing: Process multiple images simultaneously, maximizing GPU throughput and reducing overhead
- Multi-Device Support: Seamlessly works on CPU, CUDA (NVIDIA GPUs), and MPS (Apple Silicon)
- Multiple Algorithms: Supports Histogram Matching, Reinhard, and Macenko normalization methods
- Automatic Backend Selection: Intelligently chooses between optimized CUDA kernels and PyTorch backends
Why Batch Processing Matters
Batch processing is crucial for histopathology workflows where you often need to normalize hundreds or thousands of images. By processing images in batches:
- Better GPU Utilization: Parallel processing across the entire batch maximizes GPU compute resources
- Reduced Overhead: Single kernel launches for entire batches instead of per-image launches
- Faster Processing: Up to 5-7x speedup with CUDA backend compared to PyTorch backend for batch processing
- Memory Efficiency: Optimized memory access patterns for batched operations
- Higher Throughput: Batch processing achieves 40,000+ images/second (vs ~5,500 for single images)
Quick Example
import torch
from stainx import Reinhard, Macenko, HistogramMatching
# Load your images as torch.Tensor (N, C, H, W) or (N, H, W, C)
reference_image = torch.randn(1, 3, 512, 512) # Reference image
source_images = torch.randn(10, 3, 512, 512) # Batch of images to normalize
# Reinhard normalization
normalizer = Reinhard(device="cuda") # or "cpu", "mps"
normalizer.fit(reference_image)
normalized = normalizer.transform(source_images) # Process entire batch at once
Performance
StainX provides significant performance improvements, especially when processing batches of images. Based on benchmarks on NVIDIA RTX A6000:
- CUDA Backend Speedup: 5.3-5.4x for Reinhard, 4.6-7.3x for Macenko
- Batch Processing Throughput: Up to 46,600 images/second (vs ~5,500 for single images)
- Optimal Batch Size: 64-128 images provides best performance
See the Benchmarks page for detailed performance benchmarks and code examples.
Installation
CUDA extensions will be automatically built if CUDA is available. Requires PyTorch >=2.0.0 and CuPy >=12.0.0.
Features
- Multiple algorithms: Histogram Matching, Reinhard, and Macenko normalization
- Automatic backend selection: torch, torch_cuda, cupy, or cupy_cuda backends
- Batch processing: Enhanced normalization through efficient batch processing of multiple images
- Flexible device support: CPU, CUDA, MPS (Apple Silicon)
Documentation
- Quick Start Guide - Get started in minutes
- Installation Guide - Detailed installation instructions
- Examples - Usage examples and patterns
- Benchmarks - Performance benchmarks and comparisons
- API Reference - Complete API documentation
Contributing
We welcome contributions! See our Contributing Guide for details.
License
This project is licensed under the GNU General Public License v3 (GPL-3.0-or-later).
Links
- GitHub: https://github.com/rendeirolab/stainx
- Issues: https://github.com/rendeirolab/stainx/issues
- Documentation: https://stainx.readthedocs.io/