Backends

StainX supports multiple backends for different performance characteristics and device compatibility.

Available Backends

torch: PyTorch backend, works on CPU, CUDA, and MPS devices
torch_cuda: Optimized PyTorch CUDA extension, requires CUDA Toolkit and CUDA device
cupy: CuPy backend, requires CUDA and CuPy
cupy_cuda: CuPy CUDA backend, requires CUDA and CuPy

Backend Selection

Backends are automatically selected based on device availability:

For non-CUDA devices → use torch backend
For CUDA devices → try torch_cuda, then cupy_cuda, then cupy, fallback to torch

from stainx import Reinhard

# Automatic selection
normalizer = Reinhard(device="cuda")

# Force specific backend
normalizer = Reinhard(device="cuda", backend="torch")
normalizer = Reinhard(device="cuda", backend="torch_cuda")
normalizer = Reinhard(device="cuda", backend="cupy")
normalizer = Reinhard(device="cuda", backend="cupy_cuda")

Device Support

torch Backend

✅ CPU
✅ CUDA (NVIDIA GPUs)
✅ MPS (Apple Silicon)

torch_cuda Backend

❌ CPU (not supported)
✅ CUDA (NVIDIA GPUs only)
❌ MPS (not supported)

cupy Backend

❌ CPU (not supported)
✅ CUDA (NVIDIA GPUs only)
❌ MPS (not supported)

cupy_cuda Backend

❌ CPU (not supported)
✅ CUDA (NVIDIA GPUs only)
❌ MPS (not supported)

Performance

CUDA backends (torch_cuda, cupy_cuda) provide significant performance improvements for batch processing:

Reinhard: 5.3-5.4x speedup compared to torch backend
Macenko: 4.6-7.3x speedup compared to torch backend
HistogramMatching: torch backend is typically faster for this method
Best performance with larger batch sizes (64-128 images)
Optimized memory access patterns

Example performance (NVIDIA RTX A6000, batch 32, 256×256): - Reinhard: torch_cuda 0.72ms vs torch 3.87ms (5.4x speedup) - Macenko: torch_cuda 12.51ms vs torch 57.02ms (4.6x speedup)

See the Benchmarks page for detailed performance comparisons.

Check Backend Availability

from stainx.backends.torch_cuda_backend import CUDA_AVAILABLE as TORCH_CUDA_AVAILABLE
from stainx.backends.cupy_cuda_backend import CUDA_AVAILABLE as CUPY_CUDA_AVAILABLE

if TORCH_CUDA_AVAILABLE:
    print("torch_cuda backend is available!")
if CUPY_CUDA_AVAILABLE:
    print("cupy_cuda backend is available!")

When to Use Which Backend

Use torch_cuda or cupy_cuda when: - You have a CUDA-capable GPU - Processing large batches of images - Maximum performance is required

Use torch backend when: - Running on CPU or MPS - CUDA backends are not available - You need maximum compatibility - Working with small batches where overhead matters less