GPU Compute

GPU acceleration with Epochly Level 4 for high-performance computing workloads.

Overview

Level 4 provides GPU acceleration for NumPy operations, offloading compute-intensive operations to NVIDIA GPUs for massive performance gains.

Requirements

Requirement	Specification
CUDA	Version 11.0 or higher (12.0+ recommended)
GPU	NVIDIA GPU with CUDA support
CuPy	Python library for GPU arrays
License	Epochly Pro or Enterprise license
Environment	`EPOCHLY_LEVEL=4`

Setup

Install CuPy for Your CUDA Version

CUDA 11.x:

pip install cupy-cuda11x

CUDA 12.x (Recommended):

pip install cupy-cuda12x

Or install with Epochly GPU extras:

pip install epochly[gpu]

Verify GPU Setup

# Check CUDA availability
python -c "import cupy; print(cupy.cuda.runtime.getDeviceCount())"
# Verify Epochly GPU support
epochly doctor --gpu

Configure Epochly for GPU

export EPOCHLY_LEVEL=4
export EPOCHLY_LICENSE_KEY=your-pro-license-key

Or programmatically:

import epochly
epochly.configure(
    enhancement_level=4,
    license_key='your-pro-license-key'
)

Basic GPU Operations

Decorator Usage

import epochly
import numpy as np
@epochly.optimize(level=4)
def gpu_matrix_multiply(a, b):
    """Matrix multiplication offloaded to GPU"""
    return np.dot(a, b)
# Create large matrices
a = np.random.rand(5000, 5000).astype(np.float32)
b = np.random.rand(5000, 5000).astype(np.float32)
# Automatically runs on GPU
result = gpu_matrix_multiply(a, b)
print(f"Result shape: {result.shape}")

Context Manager

import epochly
import numpy as np
# All operations in context use GPU
with epochly.optimize_context(level=4):
    a = np.random.rand(10000, 10000).astype(np.float32)
    b = np.random.rand(10000, 10000).astype(np.float32)
    
    # Matrix operations on GPU
    c = np.dot(a, b)
    result = np.sum(c ** 2)
    
print(f"Result: {result}")

Matrix Operations Example

import epochly
import numpy as np
import time
@epochly.optimize(level=4)
def gpu_matrix_operations(n=5000):
    """Complex matrix operations on GPU"""
    # Create matrices
    a = np.random.rand(n, n).astype(np.float32)
    b = np.random.rand(n, n).astype(np.float32)
    
    # Matrix multiplication
    c = np.dot(a, b)
    
    # Element-wise operations
    d = c ** 2 + np.sin(c)
    
    # Reduction
    result = np.sum(d)
    
    return result
# Benchmark
start = time.perf_counter()
result = gpu_matrix_operations(5000)
end = time.perf_counter()
print(f"Result: {result}")
print(f"Time: {end - start:.3f}s")

FFT Computations

GPU acceleration for Fast Fourier Transform operations:

import epochly
import numpy as np
from scipy import fft
@epochly.optimize(level=4)
def gpu_fft_analysis(signal):
    """FFT analysis offloaded to GPU"""
    # Forward FFT
    freq_domain = fft.fft(signal)
    
    # Power spectrum
    power = np.abs(freq_domain) ** 2
    
    # Inverse FFT
    reconstructed = fft.ifft(freq_domain)
    
    return power, reconstructed
# Generate signal (1 million samples)
t = np.linspace(0, 1, 1_000_000, dtype=np.float32)
signal = np.sin(2 * np.pi * 50 * t) + 0.5 * np.sin(2 * np.pi * 120 * t)
# Process on GPU
power_spectrum, reconstructed = gpu_fft_analysis(signal)
print(f"Power spectrum shape: {power_spectrum.shape}")
print(f"Peak frequency index: {np.argmax(power_spectrum)}")

2D FFT for Image Processing

import epochly
import numpy as np
from scipy import fft
@epochly.optimize(level=4)
def gpu_2d_fft(image):
    """2D FFT for image processing on GPU"""
    # 2D FFT
    freq_image = fft.fft2(image)
    
    # Shift zero frequency to center
    freq_shifted = fft.fftshift(freq_image)
    
    # Magnitude spectrum
    magnitude = np.abs(freq_shifted)
    
    return magnitude
# Create test image (4K resolution)
image = np.random.rand(2160, 3840).astype(np.float32)
# Process on GPU
freq_magnitude = gpu_2d_fft(image)
print(f"Frequency magnitude shape: {freq_magnitude.shape}")

GPU Monte Carlo Simulation

High-performance Monte Carlo simulations using GPU:

import epochly
import numpy as np
@epochly.optimize(level=4)
def gpu_monte_carlo_pi(samples=100_000_000):
    """Estimate Pi using Monte Carlo on GPU"""
    # Generate random points
    x = np.random.rand(samples).astype(np.float32)
    y = np.random.rand(samples).astype(np.float32)
    
    # Check if inside unit circle
    inside = (x ** 2 + y ** 2) <= 1
    
    # Estimate Pi
    pi_estimate = 4 * np.sum(inside) / samples
    
    return pi_estimate
# Run simulation
pi = gpu_monte_carlo_pi(100_000_000)
print(f"Pi estimate: {pi:.6f}")
print(f"Error: {abs(pi - np.pi):.6f}")

Option Pricing Monte Carlo

import epochly
import numpy as np
@epochly.optimize(level=4)
def gpu_option_pricing(S0, K, T, r, sigma, simulations=1_000_000):
    """
    Black-Scholes Monte Carlo option pricing on GPU
    
    S0: Initial stock price
    K: Strike price
    T: Time to maturity
    r: Risk-free rate
    sigma: Volatility
    """
    # Generate random paths
    z = np.random.standard_normal(simulations).astype(np.float32)
    
    # Stock price at maturity
    ST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * z)
    
    # Call option payoff
    payoff = np.maximum(ST - K, 0)
    
    # Discounted expected payoff
    option_price = np.exp(-r * T) * np.mean(payoff)
    
    return option_price
# Price a call option
price = gpu_option_pricing(
    S0=100,      # Current stock price
    K=105,       # Strike price
    T=1.0,       # 1 year to maturity
    r=0.05,      # 5% risk-free rate
    sigma=0.2,   # 20% volatility
    simulations=10_000_000
)
print(f"Call option price: $" + str(round(price, 2)))

Memory Management

Configure GPU Memory Limits

# Limit GPU memory allocation (in bytes)
export EPOCHLY_GPU_MEMORY_LIMIT=2147483648
# Set workload threshold for GPU offloading
export EPOCHLY_GPU_WORKLOAD_THRESHOLD=1000000

Programmatic Memory Configuration

import epochly
epochly.configure(
    enhancement_level=4,
    gpu_memory_limit=2 * 1024 * 1024 * 1024,
    gpu_workload_threshold=1_000_000
)

Memory Management Example

import epochly
import numpy as np
# Configure memory limits
epochly.configure(
    enhancement_level=4,
    gpu_memory_limit=4 * 1024 * 1024 * 1024,
)
@epochly.optimize(level=4)
def gpu_batch_processing(data_chunks):
    """Process data in batches to manage GPU memory"""
    results = []
    
    for chunk in data_chunks:
        # Process each chunk on GPU
        processed = chunk ** 2 + np.sin(chunk)
        results.append(np.sum(processed))
    
    return results
# Split large dataset into manageable chunks
large_data = np.random.rand(50_000_000).astype(np.float32)
chunks = np.array_split(large_data, 10)
# Process in batches
results = gpu_batch_processing(chunks)
print(f"Processed {len(chunks)} chunks")

Performance Tips

Best Practices for GPU Acceleration

Use large arrays: GPU overhead is only worthwhile for arrays with 100K+ elements
Use float32: GPU performance is better with single-precision floats
Minimize transfers: Keep data on GPU between operations when possible
Batch operations: Combine multiple operations to reduce CPU-GPU transfers
Set memory limits: Configure EPOCHLY_GPU_MEMORY_LIMIT to prevent OOM errors
Monitor GPU usage: Use nvidia-smi to monitor GPU utilization
Enable CPU fallback: Use gpu_fallback=True for production resilience

Optimal Array Sizes

Array Size	Recommendation
Less than 10K elements	Use CPU (Level 2-3)
10K - 100K	GPU may help, benchmark first
100K - 10M	GPU beneficial
More than 10M elements	GPU highly beneficial

Example: CPU Fallback

import epochly
import numpy as np
@epochly.optimize(level=4, gpu_fallback=True)
def adaptive_computation(data):
    """Automatically falls back to CPU if GPU unavailable"""
    return np.dot(data, data.T)
# Works even if GPU is unavailable
result = adaptive_computation(np.random.rand(5000, 5000))

Troubleshooting

Common Errors

#### CUDA Not Found

Error:

CudaRuntimeError: cudaErrorNoDevice: no CUDA-capable device is detected

Solution:

# Check if NVIDIA driver is installed
nvidia-smi
# If not installed, install CUDA toolkit
# See: https://developer.nvidia.com/cuda-downloads

#### Out of Memory

Error:

OutOfMemoryError: CUDA out of memory

Solution:

# Reduce memory limit
export EPOCHLY_GPU_MEMORY_LIMIT=1073741824
# Increase workload threshold
export EPOCHLY_GPU_WORKLOAD_THRESHOLD=5000000

#### CuPy Not Installed

Error:

ImportError: CuPy is required for GPU acceleration

Solution:

# Check CUDA version
nvidia-smi
# Install matching CuPy version
pip install cupy-cuda11x  # For CUDA 11.x
pip install cupy-cuda12x  # For CUDA 12.x

#### CuPy Version Mismatch

Error:

ImportError: CuPy is not compatible with the current CUDA version

Solution:

# Uninstall current CuPy
pip uninstall cupy-cuda11x cupy-cuda12x
# Check CUDA version
nvcc --version
# Install correct version
pip install cupy-cuda12x

Debugging GPU Issues

import epochly
# Check GPU status
status = epochly.get_status()
print(f"GPU Available: {status.get('gpu_available', False)}")
print(f"GPU Device: {status.get('gpu_device', 'N/A')}")
print(f"CUDA Version: {status.get('cuda_version', 'N/A')}")
# Enable debug logging
epochly.configure(
    enhancement_level=4,
    log_level='DEBUG'
)

Monitor GPU Usage

# Watch GPU usage in real-time
watch -n 1 nvidia-smi
# Or use continuous monitoring
nvidia-smi dmon -s pucvmet

GPU Compute

Overview

Requirements

Setup

Install CuPy for Your CUDA Version

Verify GPU Setup

Configure Epochly for GPU

Basic GPU Operations

Decorator Usage

Context Manager

Matrix Operations Example

FFT Computations

2D FFT for Image Processing

GPU Monte Carlo Simulation

Option Pricing Monte Carlo

Memory Management

Configure GPU Memory Limits

Programmatic Memory Configuration

Memory Management Example

Performance Tips

Best Practices for GPU Acceleration

Optimal Array Sizes

Example: CPU Fallback

Troubleshooting

Common Errors

Debugging GPU Issues

Monitor GPU Usage

Related Documentation