Level 4: GPU Acceleration

Level 4 provides GPU acceleration for large array operations.

Overview

Aspect	Value
Startup Overhead	~1s (GPU initialization)
Per-Call Overhead	~50μs
Memory	GPU VRAM dependent
Best For	Large array operations

Requirements

NVIDIA GPU with CUDA support
Minimum 4GB VRAM recommended
Pro or Enterprise license (Community limited to Levels 0-3)

pip install cupy-cuda12x  # CUDA 12.x

Enabling Level 4

import epochly
import numpy as np
@epochly.optimize(level=4)
def gpu_compute(arr):
    return arr ** 2 + arr * 0.5
# Large array benefits from GPU
data = np.random.rand(50_000_000).astype(np.float32)
result = gpu_compute(data)

When GPU Helps

Scenario	Speedup
Elementwise ops (10M+ elements)	66–70x
Matrix multiplication (4K+)	7–10x
Batched convolutions	14–19x

When GPU Does Not Help

Small arrays (< 1M elements)
Sequential operations
Complex conditionals
Frequent small transfers

Performance Tips

Use float32: GPU operations are faster with single precision
Batch Operations: Minimize CPU-GPU transfers
Ensure Sufficient Data: Arrays > 10M elements for best benefit