Documentation

Level 4: GPU Acceleration

Level 4 provides GPU acceleration for large array operations.

Overview

AspectValue
Startup Overhead~1s (GPU initialization)
Per-Call Overhead~50μs
MemoryGPU VRAM dependent
Best ForLarge array operations

Requirements

  • NVIDIA GPU with CUDA support
  • Minimum 4GB VRAM recommended
  • Pro or Enterprise license (Community limited to Levels 0-3)
pip install cupy-cuda12x # CUDA 12.x

Enabling Level 4

import epochly
import numpy as np
@epochly.optimize(level=4)
def gpu_compute(arr):
return arr ** 2 + arr * 0.5
# Large array benefits from GPU
data = np.random.rand(50_000_000).astype(np.float32)
result = gpu_compute(data)

When GPU Helps

ScenarioSpeedup
Elementwise ops (10M+ elements)66–70x
Matrix multiplication (4K+)7–10x
Batched convolutions14–19x

When GPU Does Not Help

  • Small arrays (< 1M elements)
  • Sequential operations
  • Complex conditionals
  • Frequent small transfers

Performance Tips

  1. Use float32: GPU operations are faster with single precision
  2. Batch Operations: Minimize CPU-GPU transfers
  3. Ensure Sufficient Data: Arrays > 10M elements for best benefit