Documentation

GPU Compute

GPU acceleration with Epochly Level 4 for high-performance computing workloads.

Overview

Level 4 provides GPU acceleration for NumPy operations, offloading compute-intensive operations to NVIDIA GPUs for massive performance gains.

Requirements

RequirementSpecification
CUDAVersion 11.0 or higher (12.0+ recommended)
GPUNVIDIA GPU with CUDA support
CuPyPython library for GPU arrays
LicenseEpochly Pro or Enterprise license
EnvironmentEPOCHLY_LEVEL=4

Setup

Install CuPy for Your CUDA Version

CUDA 11.x:

pip install cupy-cuda11x

CUDA 12.x (Recommended):

pip install cupy-cuda12x

Or install with Epochly GPU extras:

pip install epochly[gpu]

Verify GPU Setup

# Check CUDA availability
python -c "import cupy; print(cupy.cuda.runtime.getDeviceCount())"
# Verify Epochly GPU support
epochly doctor --gpu

Configure Epochly for GPU

export EPOCHLY_LEVEL=4
export EPOCHLY_LICENSE_KEY=your-pro-license-key

Or programmatically:

import epochly
epochly.configure(
enhancement_level=4,
license_key='your-pro-license-key'
)

Basic GPU Operations

Decorator Usage

import epochly
import numpy as np
@epochly.optimize(level=4)
def gpu_matrix_multiply(a, b):
"""Matrix multiplication offloaded to GPU"""
return np.dot(a, b)
# Create large matrices
a = np.random.rand(5000, 5000).astype(np.float32)
b = np.random.rand(5000, 5000).astype(np.float32)
# Automatically runs on GPU
result = gpu_matrix_multiply(a, b)
print(f"Result shape: {result.shape}")

Context Manager

import epochly
import numpy as np
# All operations in context use GPU
with epochly.optimize_context(level=4):
a = np.random.rand(10000, 10000).astype(np.float32)
b = np.random.rand(10000, 10000).astype(np.float32)
# Matrix operations on GPU
c = np.dot(a, b)
result = np.sum(c ** 2)
print(f"Result: {result}")

Matrix Operations Example

import epochly
import numpy as np
import time
@epochly.optimize(level=4)
def gpu_matrix_operations(n=5000):
"""Complex matrix operations on GPU"""
# Create matrices
a = np.random.rand(n, n).astype(np.float32)
b = np.random.rand(n, n).astype(np.float32)
# Matrix multiplication
c = np.dot(a, b)
# Element-wise operations
d = c ** 2 + np.sin(c)
# Reduction
result = np.sum(d)
return result
# Benchmark
start = time.perf_counter()
result = gpu_matrix_operations(5000)
end = time.perf_counter()
print(f"Result: {result}")
print(f"Time: {end - start:.3f}s")

FFT Computations

GPU acceleration for Fast Fourier Transform operations:

import epochly
import numpy as np
from scipy import fft
@epochly.optimize(level=4)
def gpu_fft_analysis(signal):
"""FFT analysis offloaded to GPU"""
# Forward FFT
freq_domain = fft.fft(signal)
# Power spectrum
power = np.abs(freq_domain) ** 2
# Inverse FFT
reconstructed = fft.ifft(freq_domain)
return power, reconstructed
# Generate signal (1 million samples)
t = np.linspace(0, 1, 1_000_000, dtype=np.float32)
signal = np.sin(2 * np.pi * 50 * t) + 0.5 * np.sin(2 * np.pi * 120 * t)
# Process on GPU
power_spectrum, reconstructed = gpu_fft_analysis(signal)
print(f"Power spectrum shape: {power_spectrum.shape}")
print(f"Peak frequency index: {np.argmax(power_spectrum)}")

2D FFT for Image Processing

import epochly
import numpy as np
from scipy import fft
@epochly.optimize(level=4)
def gpu_2d_fft(image):
"""2D FFT for image processing on GPU"""
# 2D FFT
freq_image = fft.fft2(image)
# Shift zero frequency to center
freq_shifted = fft.fftshift(freq_image)
# Magnitude spectrum
magnitude = np.abs(freq_shifted)
return magnitude
# Create test image (4K resolution)
image = np.random.rand(2160, 3840).astype(np.float32)
# Process on GPU
freq_magnitude = gpu_2d_fft(image)
print(f"Frequency magnitude shape: {freq_magnitude.shape}")

GPU Monte Carlo Simulation

High-performance Monte Carlo simulations using GPU:

import epochly
import numpy as np
@epochly.optimize(level=4)
def gpu_monte_carlo_pi(samples=100_000_000):
"""Estimate Pi using Monte Carlo on GPU"""
# Generate random points
x = np.random.rand(samples).astype(np.float32)
y = np.random.rand(samples).astype(np.float32)
# Check if inside unit circle
inside = (x ** 2 + y ** 2) <= 1
# Estimate Pi
pi_estimate = 4 * np.sum(inside) / samples
return pi_estimate
# Run simulation
pi = gpu_monte_carlo_pi(100_000_000)
print(f"Pi estimate: {pi:.6f}")
print(f"Error: {abs(pi - np.pi):.6f}")

Option Pricing Monte Carlo

import epochly
import numpy as np
@epochly.optimize(level=4)
def gpu_option_pricing(S0, K, T, r, sigma, simulations=1_000_000):
"""
Black-Scholes Monte Carlo option pricing on GPU
S0: Initial stock price
K: Strike price
T: Time to maturity
r: Risk-free rate
sigma: Volatility
"""
# Generate random paths
z = np.random.standard_normal(simulations).astype(np.float32)
# Stock price at maturity
ST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * z)
# Call option payoff
payoff = np.maximum(ST - K, 0)
# Discounted expected payoff
option_price = np.exp(-r * T) * np.mean(payoff)
return option_price
# Price a call option
price = gpu_option_pricing(
S0=100, # Current stock price
K=105, # Strike price
T=1.0, # 1 year to maturity
r=0.05, # 5% risk-free rate
sigma=0.2, # 20% volatility
simulations=10_000_000
)
print(f"Call option price: $" + str(round(price, 2)))

Memory Management

Configure GPU Memory Limits

# Limit GPU memory allocation (in bytes)
export EPOCHLY_GPU_MEMORY_LIMIT=2147483648
# Set workload threshold for GPU offloading
export EPOCHLY_GPU_WORKLOAD_THRESHOLD=1000000

Programmatic Memory Configuration

import epochly
epochly.configure(
enhancement_level=4,
gpu_memory_limit=2 * 1024 * 1024 * 1024,
gpu_workload_threshold=1_000_000
)

Memory Management Example

import epochly
import numpy as np
# Configure memory limits
epochly.configure(
enhancement_level=4,
gpu_memory_limit=4 * 1024 * 1024 * 1024,
)
@epochly.optimize(level=4)
def gpu_batch_processing(data_chunks):
"""Process data in batches to manage GPU memory"""
results = []
for chunk in data_chunks:
# Process each chunk on GPU
processed = chunk ** 2 + np.sin(chunk)
results.append(np.sum(processed))
return results
# Split large dataset into manageable chunks
large_data = np.random.rand(50_000_000).astype(np.float32)
chunks = np.array_split(large_data, 10)
# Process in batches
results = gpu_batch_processing(chunks)
print(f"Processed {len(chunks)} chunks")

Performance Tips

Best Practices for GPU Acceleration

  1. Use large arrays: GPU overhead is only worthwhile for arrays with 100K+ elements
  2. Use float32: GPU performance is better with single-precision floats
  3. Minimize transfers: Keep data on GPU between operations when possible
  4. Batch operations: Combine multiple operations to reduce CPU-GPU transfers
  5. Set memory limits: Configure EPOCHLY_GPU_MEMORY_LIMIT to prevent OOM errors
  6. Monitor GPU usage: Use nvidia-smi to monitor GPU utilization
  7. Enable CPU fallback: Use gpu_fallback=True for production resilience

Optimal Array Sizes

Array SizeRecommendation
Less than 10K elementsUse CPU (Level 2-3)
10K - 100KGPU may help, benchmark first
100K - 10MGPU beneficial
More than 10M elementsGPU highly beneficial

Example: CPU Fallback

import epochly
import numpy as np
@epochly.optimize(level=4, gpu_fallback=True)
def adaptive_computation(data):
"""Automatically falls back to CPU if GPU unavailable"""
return np.dot(data, data.T)
# Works even if GPU is unavailable
result = adaptive_computation(np.random.rand(5000, 5000))

Troubleshooting

Common Errors

#### CUDA Not Found

Error:

CudaRuntimeError: cudaErrorNoDevice: no CUDA-capable device is detected

Solution:

# Check if NVIDIA driver is installed
nvidia-smi
# If not installed, install CUDA toolkit
# See: https://developer.nvidia.com/cuda-downloads

#### Out of Memory

Error:

OutOfMemoryError: CUDA out of memory

Solution:

# Reduce memory limit
export EPOCHLY_GPU_MEMORY_LIMIT=1073741824
# Increase workload threshold
export EPOCHLY_GPU_WORKLOAD_THRESHOLD=5000000

#### CuPy Not Installed

Error:

ImportError: CuPy is required for GPU acceleration

Solution:

# Check CUDA version
nvidia-smi
# Install matching CuPy version
pip install cupy-cuda11x # For CUDA 11.x
pip install cupy-cuda12x # For CUDA 12.x

#### CuPy Version Mismatch

Error:

ImportError: CuPy is not compatible with the current CUDA version

Solution:

# Uninstall current CuPy
pip uninstall cupy-cuda11x cupy-cuda12x
# Check CUDA version
nvcc --version
# Install correct version
pip install cupy-cuda12x

Debugging GPU Issues

import epochly
# Check GPU status
status = epochly.get_status()
print(f"GPU Available: {status.get('gpu_available', False)}")
print(f"GPU Device: {status.get('gpu_device', 'N/A')}")
print(f"CUDA Version: {status.get('cuda_version', 'N/A')}")
# Enable debug logging
epochly.configure(
enhancement_level=4,
log_level='DEBUG'
)

Monitor GPU Usage

# Watch GPU usage in real-time
watch -n 1 nvidia-smi
# Or use continuous monitoring
nvidia-smi dmon -s pucvmet