Level 4: GPU Acceleration
Level 4 provides GPU acceleration for large array operations.
Overview
| Aspect | Value |
|---|---|
| Startup Overhead | ~1s (GPU initialization) |
| Per-Call Overhead | ~50μs |
| Memory | GPU VRAM dependent |
| Best For | Large array operations |
Requirements
- NVIDIA GPU with CUDA support
- Minimum 4GB VRAM recommended
- Pro or Enterprise license (Community limited to Levels 0-3)
pip install cupy-cuda12x # CUDA 12.x
Enabling Level 4
import epochlyimport numpy as np@epochly.optimize(level=4)def gpu_compute(arr):return arr ** 2 + arr * 0.5# Large array benefits from GPUdata = np.random.rand(50_000_000).astype(np.float32)result = gpu_compute(data)
When GPU Helps
| Scenario | Speedup |
|---|---|
| Elementwise ops (10M+ elements) | 66–70x |
| Matrix multiplication (4K+) | 7–10x |
| Batched convolutions | 14–19x |
When GPU Does Not Help
- Small arrays (< 1M elements)
- Sequential operations
- Complex conditionals
- Frequent small transfers
Performance Tips
- Use float32: GPU operations are faster with single precision
- Batch Operations: Minimize CPU-GPU transfers
- Ensure Sufficient Data: Arrays > 10M elements for best benefit