Backends
StainX supports multiple backends for different performance characteristics and device compatibility.
Available Backends
- torch: PyTorch backend, works on CPU, CUDA, and MPS devices
- torch_cuda: Optimized PyTorch CUDA extension, requires CUDA Toolkit and CUDA device
- cupy: CuPy backend, requires CUDA and CuPy
- cupy_cuda: CuPy CUDA backend, requires CUDA and CuPy
Backend Selection
Backends are automatically selected based on device availability:
- For non-CUDA devices → use
torchbackend - For CUDA devices → try
torch_cuda, thencupy_cuda, thencupy, fallback totorch
from stainx import Reinhard
# Automatic selection
normalizer = Reinhard(device="cuda")
# Force specific backend
normalizer = Reinhard(device="cuda", backend="torch")
normalizer = Reinhard(device="cuda", backend="torch_cuda")
normalizer = Reinhard(device="cuda", backend="cupy")
normalizer = Reinhard(device="cuda", backend="cupy_cuda")
Device Support
torch Backend
- ✅ CPU
- ✅ CUDA (NVIDIA GPUs)
- ✅ MPS (Apple Silicon)
torch_cuda Backend
- ❌ CPU (not supported)
- ✅ CUDA (NVIDIA GPUs only)
- ❌ MPS (not supported)
cupy Backend
- ❌ CPU (not supported)
- ✅ CUDA (NVIDIA GPUs only)
- ❌ MPS (not supported)
cupy_cuda Backend
- ❌ CPU (not supported)
- ✅ CUDA (NVIDIA GPUs only)
- ❌ MPS (not supported)
Performance
CUDA backends (torch_cuda, cupy_cuda) provide significant performance improvements for batch processing:
- Reinhard: 5.3-5.4x speedup compared to torch backend
- Macenko: 4.6-7.3x speedup compared to torch backend
- HistogramMatching: torch backend is typically faster for this method
- Best performance with larger batch sizes (64-128 images)
- Optimized memory access patterns
Example performance (NVIDIA RTX A6000, batch 32, 256×256): - Reinhard: torch_cuda 0.72ms vs torch 3.87ms (5.4x speedup) - Macenko: torch_cuda 12.51ms vs torch 57.02ms (4.6x speedup)
See the Benchmarks page for detailed performance comparisons.
Check Backend Availability
from stainx.backends.torch_cuda_backend import CUDA_AVAILABLE as TORCH_CUDA_AVAILABLE
from stainx.backends.cupy_cuda_backend import CUDA_AVAILABLE as CUPY_CUDA_AVAILABLE
if TORCH_CUDA_AVAILABLE:
print("torch_cuda backend is available!")
if CUPY_CUDA_AVAILABLE:
print("cupy_cuda backend is available!")
When to Use Which Backend
Use torch_cuda or cupy_cuda when: - You have a CUDA-capable GPU - Processing large batches of images - Maximum performance is required
Use torch backend when: - Running on CPU or MPS - CUDA backends are not available - You need maximum compatibility - Working with small batches where overhead matters less