Documentation

Level Selection Guide

Choose the right enhancement level for your workload to maximize performance gains.

Decision Matrix

LevelNameBest ForExample WorkloadTypical SpeedupRequirements
0MonitoringBaseline measurements, production monitoringAny workload1x (baseline)None
1ThreadingI/O-bound operationsFile reading, network requests, database queries2-5xGIL-releasing I/O operations
2JIT CompilationNumerical loops, CPU-bound Python codeNumerical computations, custom algorithms58–193xPython 3.9-3.12 (Numba) or 3.13+ (Native JIT)
3Multi-core ParallelismCPU-bound parallel workloadsList comprehensions, batch processing, aggregations8–12xMultiple CPU cores, partitionable data
4GPU AccelerationMassive array operationsMatrix operations, neural networks, image processing7–70xNVIDIA GPU, CUDA, CuPy, Pro license

Level 0: Monitoring

Use Level 0 for baseline measurements and production monitoring without optimization overhead.

When to Use Level 0

  • Establishing baseline performance metrics
  • Production environments where you want monitoring without optimization
  • Debugging performance issues
  • Comparing against optimized versions

Example: Performance Monitoring

import epochly
import numpy as np
@epochly.optimize(level=0)
def baseline_computation(n):
"""Track metrics without optimization"""
arr = np.random.rand(n)
result = np.sum(arr ** 2 + np.sin(arr))
return result
# Run computation
result = baseline_computation(1_000_000)
# Get metrics
metrics = epochly.get_metrics()
print(f"Function calls: {metrics['total_calls']}")
print(f"Total time: {metrics['total_time']:.3f}s")
print(f"Average time: {metrics['avg_time']:.3f}s")

Best Use Cases

  • Production monitoring dashboards
  • A/B testing baselines
  • Performance regression detection
  • Resource usage tracking

Level 1: Threading

Use Level 1 for I/O-bound operations that spend time waiting.

When to Use Level 1

  • File I/O operations (reading/writing multiple files)
  • Network requests (API calls, web scraping)
  • Database queries (concurrent database operations)
  • Any operation that releases the GIL

Example: Concurrent URL Fetching

import epochly
import requests
@epochly.optimize(level=1)
def fetch_urls(urls):
"""Fetch multiple URLs concurrently"""
results = []
for url in urls:
response = requests.get(url)
results.append(response.text)
return results
# Fetch multiple URLs in parallel
urls = [
'https://api.example.com/data1',
'https://api.example.com/data2',
'https://api.example.com/data3',
]
data = fetch_urls(urls)
print(f"Fetched {len(data)} URLs")

Example: Concurrent File Processing

import epochly
import json
@epochly.optimize(level=1)
def load_json_files(file_paths):
"""Load multiple JSON files concurrently"""
data = []
for path in file_paths:
with open(path, 'r') as f:
data.append(json.load(f))
return data
files = [f'data_{i}.json' for i in range(100)]
all_data = load_json_files(files)

Best Use Cases

  • Web scraping multiple pages
  • Batch file uploads/downloads
  • Concurrent database queries
  • API request batching

Requirements

  • Operations must release the Global Interpreter Lock (GIL)
  • Most I/O operations naturally release the GIL
  • Network and file operations are ideal candidates

Level 2: JIT Compilation

Use Level 2 for numerical loops and CPU-bound Python code.

When to Use Level 2

  • Numerical computations with loops
  • Custom algorithms without vectorization
  • Data transformations
  • Any CPU-bound pure Python code

Example: Numerical Loop Optimization

import epochly
import numpy as np
@epochly.optimize(level=2)
def monte_carlo_pi(n_samples):
"""Estimate Pi using Monte Carlo method with JIT"""
inside_circle = 0
for _ in range(n_samples):
x = np.random.random()
y = np.random.random()
if x**2 + y**2 <= 1:
inside_circle += 1
return 4 * inside_circle / n_samples
# JIT compiles on first call, fast on subsequent calls
pi_estimate = monte_carlo_pi(10_000_000)
print(f"Pi estimate: {pi_estimate:.6f}")

Example: Custom Algorithm

import epochly
@epochly.optimize(level=2)
def fibonacci_sequence(n):
"""Generate Fibonacci sequence with JIT"""
if n <= 1:
return n
a, b = 0, 1
for _ in range(n - 1):
a, b = b, a + b
return b
# First call compiles, subsequent calls are fast
result = fibonacci_sequence(1000)

Requirements

Python 3.9-3.12:

  • Requires Numba JIT compiler
  • Install: pip install numba

Python 3.13+:

  • Native JIT compilation built-in
  • No additional dependencies

Best Use Cases

  • Numerical simulations
  • Custom statistical functions
  • Data transformation loops
  • Algorithm implementations

Limitations

  • Compilation overhead on first call (warmup needed)
  • Not effective for already vectorized NumPy operations
  • Best for pure Python numerical code

Level 3: Multi-core Parallelism

Use Level 3 for CPU-bound operations that can be split across cores.

When to Use Level 3

  • List comprehensions with independent iterations
  • Batch processing
  • Parallel aggregations
  • Data partitioning operations

Example: Parallel List Comprehension

import epochly
import numpy as np
@epochly.optimize(level=3)
def parallel_processing(data_list):
"""Process list items in parallel across cores"""
results = [process_item(item) for item in data_list]
return results
def process_item(data):
"""CPU-intensive per-item processing"""
return np.sum(data ** 2 + np.sin(data) * np.cos(data))
# Generate data
data_list = [np.random.rand(100_000) for _ in range(100)]
# Process in parallel across all cores
results = parallel_processing(data_list)
print(f"Processed {len(results)} items")

Example: Parallel Aggregation

import epochly
import pandas as pd
import numpy as np
@epochly.optimize(level=3)
def parallel_groupby(df):
"""Parallel GroupBy aggregation"""
result = df.groupby('category').agg({
'value': ['sum', 'mean', 'std'],
'quantity': ['min', 'max', 'count']
})
return result
# Large DataFrame
df = pd.DataFrame({
'category': np.random.choice(['A', 'B', 'C'], 10_000_000),
'value': np.random.rand(10_000_000),
'quantity': np.random.randint(1, 100, 10_000_000)
})
result = parallel_groupby(df)

Requirements

  • Multiple CPU cores available
  • Partitionable data that can be split
  • No shared state between parallel tasks
  • Independent iterations (no dependencies between loop iterations)

Best Use Cases

  • Batch data processing
  • Parallel feature engineering
  • Independent model training
  • Parallel aggregations

Limitations

  • Overhead for small datasets
  • Not suitable for sequential algorithms
  • Memory overhead (copies per worker)

Level 4: GPU Acceleration

Use Level 4 for massive array operations that benefit from GPU parallelism.

When to Use Level 4

  • Large matrix operations (1M+ elements)
  • Neural network computations
  • Image/video processing
  • Scientific simulations with large arrays

Example: Large Array Operations

import epochly
import numpy as np
@epochly.optimize(level=4)
def gpu_matrix_operations(n):
"""Massive matrix operations on GPU"""
# Create large arrays
A = np.random.rand(n, n)
B = np.random.rand(n, n)
# Operations run on GPU
C = np.dot(A, B)
D = np.sin(C) + np.cos(C)
return np.sum(D)
# Process 10,000 x 10,000 matrices on GPU
result = gpu_matrix_operations(10_000)
print(f"Result: {result}")

Example: Batch Image Processing

import epochly
import numpy as np
@epochly.optimize(level=4)
def process_images(images):
"""Process batch of images on GPU"""
# Normalize
normalized = images / 255.0
# Apply transformations
transformed = normalized ** 2 + np.sin(normalized)
# Reduce
features = np.mean(transformed, axis=(1, 2))
return features
# Batch of 1000 images (256x256 RGB)
images = np.random.randint(0, 256, (1000, 256, 256, 3), dtype=np.uint8)
features = process_images(images)

Requirements

  • NVIDIA GPU with CUDA support
  • CUDA Toolkit installed
  • CuPy library: pip install cupy-cuda12x
  • Epochly Pro license (Level 4 requires Pro)

Best Use Cases

  • Deep learning inference
  • Scientific computing with large arrays
  • Image/video processing pipelines
  • Financial modeling with large datasets

Limitations

  • GPU memory constraints
  • CPU-GPU transfer overhead for small arrays
  • Requires Pro license
  • NVIDIA hardware only

Automatic Level Selection

Let Epochly choose the optimal level automatically:

import epochly
@epochly.optimize() # No level specified
def smart_function(data):
"""Epochly auto-selects the best level"""
return process(data)
# Epochly analyzes the workload and chooses:
# - Level 1 for I/O operations
# - Level 2 for numerical loops
# - Level 3 for parallel operations
# - Level 4 for large arrays (if GPU available)

How Automatic Selection Works

  1. Workload Analysis: Epochly profiles the function
  2. Pattern Detection: Identifies I/O, loops, or parallelizable operations
  3. Resource Check: Verifies available CPU cores, GPU
  4. Level Assignment: Selects optimal level based on analysis

Example: Auto-Selection

import epochly
import numpy as np
@epochly.optimize() # Auto-select
def auto_optimized(data):
"""Automatically optimized based on workload"""
# If data is large array -> Level 4
# If data has loops -> Level 2
# If data is I/O -> Level 1
return np.sum(data ** 2)
# Epochly automatically chooses Level 4 for large arrays
large_data = np.random.rand(10_000_000)
result = auto_optimized(large_data)

Comparing Levels

Benchmark all levels to find the best for your workload:

import epochly
import numpy as np
import time
def benchmark_all_levels(func, *args):
"""Compare all optimization levels"""
results = []
for level in [0, 1, 2, 3]:
@epochly.optimize(level=level)
def optimized_func(*args):
return func(*args)
# Warmup
optimized_func(*args)
# Measure
times = []
for _ in range(5):
start = time.perf_counter()
optimized_func(*args)
elapsed = time.perf_counter() - start
times.append(elapsed)
avg_time = np.mean(times)
results.append({
'level': level,
'time': avg_time,
'speedup': None # Calculate later
})
# Calculate speedups
baseline = results[0]['time']
for r in results:
r['speedup'] = baseline / r['time']
return results
# Define workload
def compute_workload(n):
arr = np.random.rand(n)
return np.sum(arr ** 2 + np.sin(arr))
# Benchmark
results = benchmark_all_levels(compute_workload, 5_000_000)
# Print results
print("\nLevel Comparison:")
print(f"{'Level':<8} {'Time (s)':<12} {'Speedup':<10}")
print("-" * 35)
for r in results:
print(f"{r['level']:<8} {r['time']:<12.4f} {r['speedup']:<10.2f}x")

Workload Analysis

Analyze your workload to get a level recommendation:

import epochly
import inspect
import numpy as np
def analyze_workload(func, sample_data):
"""Analyze workload and recommend best level"""
# Check function characteristics
source = inspect.getsource(func)
recommendations = []
# Check for I/O operations
io_keywords = ['open(', 'read(', 'write(', 'requests.', 'urllib']
if any(keyword in source for keyword in io_keywords):
recommendations.append({
'level': 1,
'reason': 'I/O operations detected',
'confidence': 'high'
})
# Check for loops
loop_keywords = ['for ', 'while ']
if any(keyword in source for keyword in loop_keywords):
recommendations.append({
'level': 2,
'reason': 'Loops detected (JIT will help)',
'confidence': 'medium'
})
# Check for list comprehensions
if '[' in source and 'for' in source:
recommendations.append({
'level': 3,
'reason': 'List comprehension (parallelizable)',
'confidence': 'medium'
})
# Check data size
if hasattr(sample_data, '__len__'):
size = len(sample_data)
if size > 1_000_000:
recommendations.append({
'level': 4,
'reason': f'Large data size ({size:,} elements)',
'confidence': 'high' if size > 10_000_000 else 'medium'
})
# Benchmark to verify
best_level = 0
best_time = float('inf')
for rec in recommendations:
level = rec['level']
with epochly.optimize_context(level=level):
# Warmup
func(sample_data)
# Measure
import time
start = time.perf_counter()
func(sample_data)
elapsed = time.perf_counter() - start
if elapsed < best_time:
best_time = elapsed
best_level = level
return {
'recommended_level': best_level,
'recommendations': recommendations,
'benchmark_time': best_time
}
# Example usage
def my_workload(data):
result = 0
for x in data:
result += x ** 2
return result
sample = np.random.rand(100_000)
analysis = analyze_workload(my_workload, sample)
print(f"Recommended Level: {analysis['recommended_level']}")
print("\nReasons:")
for rec in analysis['recommendations']:
print(f" Level {rec['level']}: {rec['reason']} (confidence: {rec['confidence']})")

Level Selection Best Practices

1. Start with Defaults

# Start with automatic selection
@epochly.optimize()
def my_function(data):
return process(data)
# Or use Level 2 as a safe default for CPU-bound work
@epochly.optimize(level=2)
def cpu_function(data):
return compute(data)

2. Benchmark Systematically

import epochly
# Test all applicable levels
levels_to_test = [0, 1, 2, 3]
for level in levels_to_test:
with epochly.benchmark_context(f"Level {level}"):
with epochly.optimize_context(level=level):
result = my_function(data)
# Choose the fastest

3. Consider Overhead

# Small data: Overhead may outweigh benefits
small_data = np.random.rand(100)
# Use Level 0 or no optimization for small data
@epochly.optimize(level=0)
def process_small(data):
return np.sum(data)
# Large data: Optimization pays off
large_data = np.random.rand(10_000_000)
# Use Level 3 for large parallel workloads
@epochly.optimize(level=3)
def process_large(data):
return [np.sum(chunk) for chunk in chunks(data)]

4. Match Data Size

Data SizeRecommended LevelReason
< 1,000Level 0Overhead too high
1,000 - 10,000Level 2JIT helps, low overhead
10,000 - 1,000,000Level 2 or 3Depends on parallelizability
> 1,000,000Level 3 or 4Multi-core or GPU benefits

5. Monitor in Production

import epochly
# Use Level 0 for monitoring without optimization
@epochly.optimize(level=0)
def production_function(data):
result = process(data)
# Log metrics
metrics = epochly.get_metrics()
log_to_monitoring(metrics)
return result

Common Mistakes

MistakeProblemSolution
Using Level 3 for small dataParallelization overhead exceeds benefitsUse Level 2 for data < 100K elements
Using Level 2 for I/O operationsJIT doesn't help waiting timeUse Level 1 for I/O-bound work
Using Level 4 without GPUFalls back to CPU, adds overheadCheck GPU availability first
Using Level 4 for small arraysCPU-GPU transfer overhead too highOnly use Level 4 for arrays > 1M elements
Not warming up JITFirst run is slow, skews benchmarksAlways run 2-3 warmup iterations
Mixing optimization levelsNested optimizations conflictUse one level per function
Optimizing already-optimized codeNumPy/pandas already optimizedFocus on pure Python loops
Ignoring memory constraintsLevel 3 creates copies, Level 4 limited by GPU RAMMonitor memory usage

Example: Avoiding Common Mistakes

import epochly
import numpy as np
# ❌ MISTAKE: Level 3 for small data
@epochly.optimize(level=3)
def bad_small_data():
data = [i ** 2 for i in range(100)] # Only 100 items!
return sum(data)
# ✅ CORRECT: No optimization for small data
def good_small_data():
data = [i ** 2 for i in range(100)]
return sum(data)
# ❌ MISTAKE: Level 2 for I/O
@epochly.optimize(level=2)
def bad_io_operation():
return [open(f).read() for f in file_paths]
# ✅ CORRECT: Level 1 for I/O
@epochly.optimize(level=1)
def good_io_operation():
return [open(f).read() for f in file_paths]
# ❌ MISTAKE: Level 4 without checking GPU
@epochly.optimize(level=4)
def bad_gpu_usage():
return np.sum(small_array)
# ✅ CORRECT: Check GPU and data size
@epochly.optimize(level=4)
def good_gpu_usage():
# Only use if GPU available and data is large
if epochly.has_gpu() and len(data) > 1_000_000:
return np.sum(data)
else:
raise ValueError("GPU not available or data too small")

Quick Reference

Level Selection Flowchart

┌─────────────────────────────────────┐
│ What type of workload? │
└────────────┬────────────────────────┘
┌────────┴────────┐
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│ I/O │ │ CPU │
│ bound? │ │ bound? │
└────┬────┘ └────┬────┘
│ │
▼ ▼
Level 1 ┌─────────┐
Threading │ Loops? │
└────┬────┘
┌────────────┴────────────┐
│ │
▼ ▼
┌─────────┐ ┌──────────┐
│ Yes │ │ No │
└────┬────┘ └────┬─────┘
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ Level 2 │ │ Parallel?│
│ JIT │ └────┬─────┘
└──────────┘ │
┌───────────┴────────┐
│ │
▼ ▼
┌──────────┐ ┌─────────┐
│ Yes │ │ Large │
└────┬─────┘ │ arrays? │
│ └────┬────┘
▼ │
┌──────────┐ ▼
│ Level 3 │ ┌──────────┐
│ Multicore│ │ Level 4 │
└──────────┘ │ GPU │
└──────────┘