GPU Compute
GPU acceleration with Epochly Level 4 for high-performance computing workloads.
Overview
Level 4 provides GPU acceleration for NumPy operations, offloading compute-intensive operations to NVIDIA GPUs for massive performance gains.
Requirements
| Requirement | Specification |
|---|---|
| CUDA | Version 11.0 or higher (12.0+ recommended) |
| GPU | NVIDIA GPU with CUDA support |
| CuPy | Python library for GPU arrays |
| License | Epochly Pro or Enterprise license |
| Environment | EPOCHLY_LEVEL=4 |
Setup
Install CuPy for Your CUDA Version
CUDA 11.x:
pip install cupy-cuda11x
CUDA 12.x (Recommended):
pip install cupy-cuda12x
Or install with Epochly GPU extras:
pip install epochly[gpu]
Verify GPU Setup
# Check CUDA availabilitypython -c "import cupy; print(cupy.cuda.runtime.getDeviceCount())"# Verify Epochly GPU supportepochly doctor --gpu
Configure Epochly for GPU
export EPOCHLY_LEVEL=4export EPOCHLY_LICENSE_KEY=your-pro-license-key
Or programmatically:
import epochlyepochly.configure(enhancement_level=4,license_key='your-pro-license-key')
Basic GPU Operations
Decorator Usage
import epochlyimport numpy as np@epochly.optimize(level=4)def gpu_matrix_multiply(a, b):"""Matrix multiplication offloaded to GPU"""return np.dot(a, b)# Create large matricesa = np.random.rand(5000, 5000).astype(np.float32)b = np.random.rand(5000, 5000).astype(np.float32)# Automatically runs on GPUresult = gpu_matrix_multiply(a, b)print(f"Result shape: {result.shape}")
Context Manager
import epochlyimport numpy as np# All operations in context use GPUwith epochly.optimize_context(level=4):a = np.random.rand(10000, 10000).astype(np.float32)b = np.random.rand(10000, 10000).astype(np.float32)# Matrix operations on GPUc = np.dot(a, b)result = np.sum(c ** 2)print(f"Result: {result}")
Matrix Operations Example
import epochlyimport numpy as npimport time@epochly.optimize(level=4)def gpu_matrix_operations(n=5000):"""Complex matrix operations on GPU"""# Create matricesa = np.random.rand(n, n).astype(np.float32)b = np.random.rand(n, n).astype(np.float32)# Matrix multiplicationc = np.dot(a, b)# Element-wise operationsd = c ** 2 + np.sin(c)# Reductionresult = np.sum(d)return result# Benchmarkstart = time.perf_counter()result = gpu_matrix_operations(5000)end = time.perf_counter()print(f"Result: {result}")print(f"Time: {end - start:.3f}s")
FFT Computations
GPU acceleration for Fast Fourier Transform operations:
import epochlyimport numpy as npfrom scipy import fft@epochly.optimize(level=4)def gpu_fft_analysis(signal):"""FFT analysis offloaded to GPU"""# Forward FFTfreq_domain = fft.fft(signal)# Power spectrumpower = np.abs(freq_domain) ** 2# Inverse FFTreconstructed = fft.ifft(freq_domain)return power, reconstructed# Generate signal (1 million samples)t = np.linspace(0, 1, 1_000_000, dtype=np.float32)signal = np.sin(2 * np.pi * 50 * t) + 0.5 * np.sin(2 * np.pi * 120 * t)# Process on GPUpower_spectrum, reconstructed = gpu_fft_analysis(signal)print(f"Power spectrum shape: {power_spectrum.shape}")print(f"Peak frequency index: {np.argmax(power_spectrum)}")
2D FFT for Image Processing
import epochlyimport numpy as npfrom scipy import fft@epochly.optimize(level=4)def gpu_2d_fft(image):"""2D FFT for image processing on GPU"""# 2D FFTfreq_image = fft.fft2(image)# Shift zero frequency to centerfreq_shifted = fft.fftshift(freq_image)# Magnitude spectrummagnitude = np.abs(freq_shifted)return magnitude# Create test image (4K resolution)image = np.random.rand(2160, 3840).astype(np.float32)# Process on GPUfreq_magnitude = gpu_2d_fft(image)print(f"Frequency magnitude shape: {freq_magnitude.shape}")
GPU Monte Carlo Simulation
High-performance Monte Carlo simulations using GPU:
import epochlyimport numpy as np@epochly.optimize(level=4)def gpu_monte_carlo_pi(samples=100_000_000):"""Estimate Pi using Monte Carlo on GPU"""# Generate random pointsx = np.random.rand(samples).astype(np.float32)y = np.random.rand(samples).astype(np.float32)# Check if inside unit circleinside = (x ** 2 + y ** 2) <= 1# Estimate Pipi_estimate = 4 * np.sum(inside) / samplesreturn pi_estimate# Run simulationpi = gpu_monte_carlo_pi(100_000_000)print(f"Pi estimate: {pi:.6f}")print(f"Error: {abs(pi - np.pi):.6f}")
Option Pricing Monte Carlo
import epochlyimport numpy as np@epochly.optimize(level=4)def gpu_option_pricing(S0, K, T, r, sigma, simulations=1_000_000):"""Black-Scholes Monte Carlo option pricing on GPUS0: Initial stock priceK: Strike priceT: Time to maturityr: Risk-free ratesigma: Volatility"""# Generate random pathsz = np.random.standard_normal(simulations).astype(np.float32)# Stock price at maturityST = S0 * np.exp((r - 0.5 * sigma**2) * T + sigma * np.sqrt(T) * z)# Call option payoffpayoff = np.maximum(ST - K, 0)# Discounted expected payoffoption_price = np.exp(-r * T) * np.mean(payoff)return option_price# Price a call optionprice = gpu_option_pricing(S0=100, # Current stock priceK=105, # Strike priceT=1.0, # 1 year to maturityr=0.05, # 5% risk-free ratesigma=0.2, # 20% volatilitysimulations=10_000_000)print(f"Call option price: $" + str(round(price, 2)))
Memory Management
Configure GPU Memory Limits
# Limit GPU memory allocation (in bytes)export EPOCHLY_GPU_MEMORY_LIMIT=2147483648# Set workload threshold for GPU offloadingexport EPOCHLY_GPU_WORKLOAD_THRESHOLD=1000000
Programmatic Memory Configuration
import epochlyepochly.configure(enhancement_level=4,gpu_memory_limit=2 * 1024 * 1024 * 1024,gpu_workload_threshold=1_000_000)
Memory Management Example
import epochlyimport numpy as np# Configure memory limitsepochly.configure(enhancement_level=4,gpu_memory_limit=4 * 1024 * 1024 * 1024,)@epochly.optimize(level=4)def gpu_batch_processing(data_chunks):"""Process data in batches to manage GPU memory"""results = []for chunk in data_chunks:# Process each chunk on GPUprocessed = chunk ** 2 + np.sin(chunk)results.append(np.sum(processed))return results# Split large dataset into manageable chunkslarge_data = np.random.rand(50_000_000).astype(np.float32)chunks = np.array_split(large_data, 10)# Process in batchesresults = gpu_batch_processing(chunks)print(f"Processed {len(chunks)} chunks")
Performance Tips
Best Practices for GPU Acceleration
- Use large arrays: GPU overhead is only worthwhile for arrays with 100K+ elements
- Use float32: GPU performance is better with single-precision floats
- Minimize transfers: Keep data on GPU between operations when possible
- Batch operations: Combine multiple operations to reduce CPU-GPU transfers
- Set memory limits: Configure EPOCHLY_GPU_MEMORY_LIMIT to prevent OOM errors
- Monitor GPU usage: Use nvidia-smi to monitor GPU utilization
- Enable CPU fallback: Use gpu_fallback=True for production resilience
Optimal Array Sizes
| Array Size | Recommendation |
|---|---|
| Less than 10K elements | Use CPU (Level 2-3) |
| 10K - 100K | GPU may help, benchmark first |
| 100K - 10M | GPU beneficial |
| More than 10M elements | GPU highly beneficial |
Example: CPU Fallback
import epochlyimport numpy as np@epochly.optimize(level=4, gpu_fallback=True)def adaptive_computation(data):"""Automatically falls back to CPU if GPU unavailable"""return np.dot(data, data.T)# Works even if GPU is unavailableresult = adaptive_computation(np.random.rand(5000, 5000))
Troubleshooting
Common Errors
#### CUDA Not Found
Error:
CudaRuntimeError: cudaErrorNoDevice: no CUDA-capable device is detected
Solution:
# Check if NVIDIA driver is installednvidia-smi# If not installed, install CUDA toolkit# See: https://developer.nvidia.com/cuda-downloads
#### Out of Memory
Error:
OutOfMemoryError: CUDA out of memory
Solution:
# Reduce memory limitexport EPOCHLY_GPU_MEMORY_LIMIT=1073741824# Increase workload thresholdexport EPOCHLY_GPU_WORKLOAD_THRESHOLD=5000000
#### CuPy Not Installed
Error:
ImportError: CuPy is required for GPU acceleration
Solution:
# Check CUDA versionnvidia-smi# Install matching CuPy versionpip install cupy-cuda11x # For CUDA 11.xpip install cupy-cuda12x # For CUDA 12.x
#### CuPy Version Mismatch
Error:
ImportError: CuPy is not compatible with the current CUDA version
Solution:
# Uninstall current CuPypip uninstall cupy-cuda11x cupy-cuda12x# Check CUDA versionnvcc --version# Install correct versionpip install cupy-cuda12x
Debugging GPU Issues
import epochly# Check GPU statusstatus = epochly.get_status()print(f"GPU Available: {status.get('gpu_available', False)}")print(f"GPU Device: {status.get('gpu_device', 'N/A')}")print(f"CUDA Version: {status.get('cuda_version', 'N/A')}")# Enable debug loggingepochly.configure(enhancement_level=4,log_level='DEBUG')
Monitor GPU Usage
# Watch GPU usage in real-timewatch -n 1 nvidia-smi# Or use continuous monitoringnvidia-smi dmon -s pucvmet