Level Selection Guide
Choose the right enhancement level for your workload to maximize performance gains.
Decision Matrix
| Level | Name | Best For | Example Workload | Typical Speedup | Requirements |
|---|---|---|---|---|---|
| 0 | Monitoring | Baseline measurements, production monitoring | Any workload | 1x (baseline) | None |
| 1 | Threading | I/O-bound operations | File reading, network requests, database queries | 2-5x | GIL-releasing I/O operations |
| 2 | JIT Compilation | Numerical loops, CPU-bound Python code | Numerical computations, custom algorithms | 58–193x | Python 3.9-3.12 (Numba) or 3.13+ (Native JIT) |
| 3 | Multi-core Parallelism | CPU-bound parallel workloads | List comprehensions, batch processing, aggregations | 8–12x | Multiple CPU cores, partitionable data |
| 4 | GPU Acceleration | Massive array operations | Matrix operations, neural networks, image processing | 7–70x | NVIDIA GPU, CUDA, CuPy, Pro license |
Level 0: Monitoring
Use Level 0 for baseline measurements and production monitoring without optimization overhead.
When to Use Level 0
- Establishing baseline performance metrics
- Production environments where you want monitoring without optimization
- Debugging performance issues
- Comparing against optimized versions
Example: Performance Monitoring
import epochlyimport numpy as np@epochly.optimize(level=0)def baseline_computation(n):"""Track metrics without optimization"""arr = np.random.rand(n)result = np.sum(arr ** 2 + np.sin(arr))return result# Run computationresult = baseline_computation(1_000_000)# Get metricsmetrics = epochly.get_metrics()print(f"Function calls: {metrics['total_calls']}")print(f"Total time: {metrics['total_time']:.3f}s")print(f"Average time: {metrics['avg_time']:.3f}s")
Best Use Cases
- Production monitoring dashboards
- A/B testing baselines
- Performance regression detection
- Resource usage tracking
Level 1: Threading
Use Level 1 for I/O-bound operations that spend time waiting.
When to Use Level 1
- File I/O operations (reading/writing multiple files)
- Network requests (API calls, web scraping)
- Database queries (concurrent database operations)
- Any operation that releases the GIL
Example: Concurrent URL Fetching
import epochlyimport requests@epochly.optimize(level=1)def fetch_urls(urls):"""Fetch multiple URLs concurrently"""results = []for url in urls:response = requests.get(url)results.append(response.text)return results# Fetch multiple URLs in parallelurls = ['https://api.example.com/data1','https://api.example.com/data2','https://api.example.com/data3',]data = fetch_urls(urls)print(f"Fetched {len(data)} URLs")
Example: Concurrent File Processing
import epochlyimport json@epochly.optimize(level=1)def load_json_files(file_paths):"""Load multiple JSON files concurrently"""data = []for path in file_paths:with open(path, 'r') as f:data.append(json.load(f))return datafiles = [f'data_{i}.json' for i in range(100)]all_data = load_json_files(files)
Best Use Cases
- Web scraping multiple pages
- Batch file uploads/downloads
- Concurrent database queries
- API request batching
Requirements
- Operations must release the Global Interpreter Lock (GIL)
- Most I/O operations naturally release the GIL
- Network and file operations are ideal candidates
Level 2: JIT Compilation
Use Level 2 for numerical loops and CPU-bound Python code.
When to Use Level 2
- Numerical computations with loops
- Custom algorithms without vectorization
- Data transformations
- Any CPU-bound pure Python code
Example: Numerical Loop Optimization
import epochlyimport numpy as np@epochly.optimize(level=2)def monte_carlo_pi(n_samples):"""Estimate Pi using Monte Carlo method with JIT"""inside_circle = 0for _ in range(n_samples):x = np.random.random()y = np.random.random()if x**2 + y**2 <= 1:inside_circle += 1return 4 * inside_circle / n_samples# JIT compiles on first call, fast on subsequent callspi_estimate = monte_carlo_pi(10_000_000)print(f"Pi estimate: {pi_estimate:.6f}")
Example: Custom Algorithm
import epochly@epochly.optimize(level=2)def fibonacci_sequence(n):"""Generate Fibonacci sequence with JIT"""if n <= 1:return na, b = 0, 1for _ in range(n - 1):a, b = b, a + breturn b# First call compiles, subsequent calls are fastresult = fibonacci_sequence(1000)
Requirements
Python 3.9-3.12:
- Requires Numba JIT compiler
- Install:
pip install numba
Python 3.13+:
- Native JIT compilation built-in
- No additional dependencies
Best Use Cases
- Numerical simulations
- Custom statistical functions
- Data transformation loops
- Algorithm implementations
Limitations
- Compilation overhead on first call (warmup needed)
- Not effective for already vectorized NumPy operations
- Best for pure Python numerical code
Level 3: Multi-core Parallelism
Use Level 3 for CPU-bound operations that can be split across cores.
When to Use Level 3
- List comprehensions with independent iterations
- Batch processing
- Parallel aggregations
- Data partitioning operations
Example: Parallel List Comprehension
import epochlyimport numpy as np@epochly.optimize(level=3)def parallel_processing(data_list):"""Process list items in parallel across cores"""results = [process_item(item) for item in data_list]return resultsdef process_item(data):"""CPU-intensive per-item processing"""return np.sum(data ** 2 + np.sin(data) * np.cos(data))# Generate datadata_list = [np.random.rand(100_000) for _ in range(100)]# Process in parallel across all coresresults = parallel_processing(data_list)print(f"Processed {len(results)} items")
Example: Parallel Aggregation
import epochlyimport pandas as pdimport numpy as np@epochly.optimize(level=3)def parallel_groupby(df):"""Parallel GroupBy aggregation"""result = df.groupby('category').agg({'value': ['sum', 'mean', 'std'],'quantity': ['min', 'max', 'count']})return result# Large DataFramedf = pd.DataFrame({'category': np.random.choice(['A', 'B', 'C'], 10_000_000),'value': np.random.rand(10_000_000),'quantity': np.random.randint(1, 100, 10_000_000)})result = parallel_groupby(df)
Requirements
- Multiple CPU cores available
- Partitionable data that can be split
- No shared state between parallel tasks
- Independent iterations (no dependencies between loop iterations)
Best Use Cases
- Batch data processing
- Parallel feature engineering
- Independent model training
- Parallel aggregations
Limitations
- Overhead for small datasets
- Not suitable for sequential algorithms
- Memory overhead (copies per worker)
Level 4: GPU Acceleration
Use Level 4 for massive array operations that benefit from GPU parallelism.
When to Use Level 4
- Large matrix operations (1M+ elements)
- Neural network computations
- Image/video processing
- Scientific simulations with large arrays
Example: Large Array Operations
import epochlyimport numpy as np@epochly.optimize(level=4)def gpu_matrix_operations(n):"""Massive matrix operations on GPU"""# Create large arraysA = np.random.rand(n, n)B = np.random.rand(n, n)# Operations run on GPUC = np.dot(A, B)D = np.sin(C) + np.cos(C)return np.sum(D)# Process 10,000 x 10,000 matrices on GPUresult = gpu_matrix_operations(10_000)print(f"Result: {result}")
Example: Batch Image Processing
import epochlyimport numpy as np@epochly.optimize(level=4)def process_images(images):"""Process batch of images on GPU"""# Normalizenormalized = images / 255.0# Apply transformationstransformed = normalized ** 2 + np.sin(normalized)# Reducefeatures = np.mean(transformed, axis=(1, 2))return features# Batch of 1000 images (256x256 RGB)images = np.random.randint(0, 256, (1000, 256, 256, 3), dtype=np.uint8)features = process_images(images)
Requirements
- NVIDIA GPU with CUDA support
- CUDA Toolkit installed
- CuPy library:
pip install cupy-cuda12x - Epochly Pro license (Level 4 requires Pro)
Best Use Cases
- Deep learning inference
- Scientific computing with large arrays
- Image/video processing pipelines
- Financial modeling with large datasets
Limitations
- GPU memory constraints
- CPU-GPU transfer overhead for small arrays
- Requires Pro license
- NVIDIA hardware only
Automatic Level Selection
Let Epochly choose the optimal level automatically:
import epochly@epochly.optimize() # No level specifieddef smart_function(data):"""Epochly auto-selects the best level"""return process(data)# Epochly analyzes the workload and chooses:# - Level 1 for I/O operations# - Level 2 for numerical loops# - Level 3 for parallel operations# - Level 4 for large arrays (if GPU available)
How Automatic Selection Works
- Workload Analysis: Epochly profiles the function
- Pattern Detection: Identifies I/O, loops, or parallelizable operations
- Resource Check: Verifies available CPU cores, GPU
- Level Assignment: Selects optimal level based on analysis
Example: Auto-Selection
import epochlyimport numpy as np@epochly.optimize() # Auto-selectdef auto_optimized(data):"""Automatically optimized based on workload"""# If data is large array -> Level 4# If data has loops -> Level 2# If data is I/O -> Level 1return np.sum(data ** 2)# Epochly automatically chooses Level 4 for large arrayslarge_data = np.random.rand(10_000_000)result = auto_optimized(large_data)
Comparing Levels
Benchmark all levels to find the best for your workload:
import epochlyimport numpy as npimport timedef benchmark_all_levels(func, *args):"""Compare all optimization levels"""results = []for level in [0, 1, 2, 3]:@epochly.optimize(level=level)def optimized_func(*args):return func(*args)# Warmupoptimized_func(*args)# Measuretimes = []for _ in range(5):start = time.perf_counter()optimized_func(*args)elapsed = time.perf_counter() - starttimes.append(elapsed)avg_time = np.mean(times)results.append({'level': level,'time': avg_time,'speedup': None # Calculate later})# Calculate speedupsbaseline = results[0]['time']for r in results:r['speedup'] = baseline / r['time']return results# Define workloaddef compute_workload(n):arr = np.random.rand(n)return np.sum(arr ** 2 + np.sin(arr))# Benchmarkresults = benchmark_all_levels(compute_workload, 5_000_000)# Print resultsprint("\nLevel Comparison:")print(f"{'Level':<8} {'Time (s)':<12} {'Speedup':<10}")print("-" * 35)for r in results:print(f"{r['level']:<8} {r['time']:<12.4f} {r['speedup']:<10.2f}x")
Workload Analysis
Analyze your workload to get a level recommendation:
import epochlyimport inspectimport numpy as npdef analyze_workload(func, sample_data):"""Analyze workload and recommend best level"""# Check function characteristicssource = inspect.getsource(func)recommendations = []# Check for I/O operationsio_keywords = ['open(', 'read(', 'write(', 'requests.', 'urllib']if any(keyword in source for keyword in io_keywords):recommendations.append({'level': 1,'reason': 'I/O operations detected','confidence': 'high'})# Check for loopsloop_keywords = ['for ', 'while ']if any(keyword in source for keyword in loop_keywords):recommendations.append({'level': 2,'reason': 'Loops detected (JIT will help)','confidence': 'medium'})# Check for list comprehensionsif '[' in source and 'for' in source:recommendations.append({'level': 3,'reason': 'List comprehension (parallelizable)','confidence': 'medium'})# Check data sizeif hasattr(sample_data, '__len__'):size = len(sample_data)if size > 1_000_000:recommendations.append({'level': 4,'reason': f'Large data size ({size:,} elements)','confidence': 'high' if size > 10_000_000 else 'medium'})# Benchmark to verifybest_level = 0best_time = float('inf')for rec in recommendations:level = rec['level']with epochly.optimize_context(level=level):# Warmupfunc(sample_data)# Measureimport timestart = time.perf_counter()func(sample_data)elapsed = time.perf_counter() - startif elapsed < best_time:best_time = elapsedbest_level = levelreturn {'recommended_level': best_level,'recommendations': recommendations,'benchmark_time': best_time}# Example usagedef my_workload(data):result = 0for x in data:result += x ** 2return resultsample = np.random.rand(100_000)analysis = analyze_workload(my_workload, sample)print(f"Recommended Level: {analysis['recommended_level']}")print("\nReasons:")for rec in analysis['recommendations']:print(f" Level {rec['level']}: {rec['reason']} (confidence: {rec['confidence']})")
Level Selection Best Practices
1. Start with Defaults
# Start with automatic selection@epochly.optimize()def my_function(data):return process(data)# Or use Level 2 as a safe default for CPU-bound work@epochly.optimize(level=2)def cpu_function(data):return compute(data)
2. Benchmark Systematically
import epochly# Test all applicable levelslevels_to_test = [0, 1, 2, 3]for level in levels_to_test:with epochly.benchmark_context(f"Level {level}"):with epochly.optimize_context(level=level):result = my_function(data)# Choose the fastest
3. Consider Overhead
# Small data: Overhead may outweigh benefitssmall_data = np.random.rand(100)# Use Level 0 or no optimization for small data@epochly.optimize(level=0)def process_small(data):return np.sum(data)# Large data: Optimization pays offlarge_data = np.random.rand(10_000_000)# Use Level 3 for large parallel workloads@epochly.optimize(level=3)def process_large(data):return [np.sum(chunk) for chunk in chunks(data)]
4. Match Data Size
| Data Size | Recommended Level | Reason |
|---|---|---|
| < 1,000 | Level 0 | Overhead too high |
| 1,000 - 10,000 | Level 2 | JIT helps, low overhead |
| 10,000 - 1,000,000 | Level 2 or 3 | Depends on parallelizability |
| > 1,000,000 | Level 3 or 4 | Multi-core or GPU benefits |
5. Monitor in Production
import epochly# Use Level 0 for monitoring without optimization@epochly.optimize(level=0)def production_function(data):result = process(data)# Log metricsmetrics = epochly.get_metrics()log_to_monitoring(metrics)return result
Common Mistakes
| Mistake | Problem | Solution |
|---|---|---|
| Using Level 3 for small data | Parallelization overhead exceeds benefits | Use Level 2 for data < 100K elements |
| Using Level 2 for I/O operations | JIT doesn't help waiting time | Use Level 1 for I/O-bound work |
| Using Level 4 without GPU | Falls back to CPU, adds overhead | Check GPU availability first |
| Using Level 4 for small arrays | CPU-GPU transfer overhead too high | Only use Level 4 for arrays > 1M elements |
| Not warming up JIT | First run is slow, skews benchmarks | Always run 2-3 warmup iterations |
| Mixing optimization levels | Nested optimizations conflict | Use one level per function |
| Optimizing already-optimized code | NumPy/pandas already optimized | Focus on pure Python loops |
| Ignoring memory constraints | Level 3 creates copies, Level 4 limited by GPU RAM | Monitor memory usage |
Example: Avoiding Common Mistakes
import epochlyimport numpy as np# ❌ MISTAKE: Level 3 for small data@epochly.optimize(level=3)def bad_small_data():data = [i ** 2 for i in range(100)] # Only 100 items!return sum(data)# ✅ CORRECT: No optimization for small datadef good_small_data():data = [i ** 2 for i in range(100)]return sum(data)# ❌ MISTAKE: Level 2 for I/O@epochly.optimize(level=2)def bad_io_operation():return [open(f).read() for f in file_paths]# ✅ CORRECT: Level 1 for I/O@epochly.optimize(level=1)def good_io_operation():return [open(f).read() for f in file_paths]# ❌ MISTAKE: Level 4 without checking GPU@epochly.optimize(level=4)def bad_gpu_usage():return np.sum(small_array)# ✅ CORRECT: Check GPU and data size@epochly.optimize(level=4)def good_gpu_usage():# Only use if GPU available and data is largeif epochly.has_gpu() and len(data) > 1_000_000:return np.sum(data)else:raise ValueError("GPU not available or data too small")
Quick Reference
Level Selection Flowchart
┌─────────────────────────────────────┐│ What type of workload? │└────────────┬────────────────────────┘│┌────────┴────────┐│ │▼ ▼┌─────────┐ ┌─────────┐│ I/O │ │ CPU ││ bound? │ │ bound? │└────┬────┘ └────┬────┘│ │▼ ▼Level 1 ┌─────────┐Threading │ Loops? │└────┬────┘│┌────────────┴────────────┐│ │▼ ▼┌─────────┐ ┌──────────┐│ Yes │ │ No │└────┬────┘ └────┬─────┘│ │▼ ▼┌──────────┐ ┌──────────┐│ Level 2 │ │ Parallel?││ JIT │ └────┬─────┘└──────────┘ │┌───────────┴────────┐│ │▼ ▼┌──────────┐ ┌─────────┐│ Yes │ │ Large │└────┬─────┘ │ arrays? ││ └────┬────┘▼ │┌──────────┐ ▼│ Level 3 │ ┌──────────┐│ Multicore│ │ Level 4 │└──────────┘ │ GPU │└──────────┘