Documentation

Examples: Intermediate

Epochly Examples: Intermediate: Level Selection Guide

Choose the right enhancement level for your workload: decision tree, workload characteristics, and expected speedups.

Choose the right enhancement level for your workload to maximize performance gains.

Decision Matrix

LevelNameBest ForExample WorkloadTypical SpeedupRequirements
0MonitoringBaseline measurements, production monitoringAny workload1x (baseline)None
1ThreadingI/O-bound operationsFile reading, network requests, database queries2-5xGIL-releasing I/O operations
2JIT CompilationNumerical loops, CPU-bound Python codeNumerical computations, custom algorithms58–193xPython 3.9-3.12 (Numba) or 3.13+ (Native JIT)
3Multi-core ParallelismCPU-bound parallel workloadsList comprehensions, batch processing, aggregations8–12xMultiple CPU cores, partitionable data
4GPU AccelerationMassive array operationsMatrix operations, neural networks, image processing7–70xNVIDIA GPU, CUDA, CuPy, Pro license

Level 0: Monitoring

Use Level 0 for baseline measurements and production monitoring without optimization overhead.

When to Use Level 0

  • Establishing baseline performance metrics
  • Production environments where you want monitoring without optimization
  • Debugging performance issues
  • Comparing against optimized versions

Example: Performance Monitoring

import epochly
import numpy as np
@epochly.optimize(level=0)
def baseline_computation(n):
"""Track metrics without optimization"""
arr = np.random.rand(n)
result = np.sum(arr ** 2 + np.sin(arr))
return result
# Run computation
result = baseline_computation(1_000_000)
# Get metrics
metrics = epochly.get_metrics()
print(f"Function calls: {metrics['total_calls']}")
print(f"Total time: {metrics['total_time']:.3f}s")
print(f"Average time: {metrics['avg_time']:.3f}s")

Best Use Cases

  • Production monitoring dashboards
  • A/B testing baselines
  • Performance regression detection
  • Resource usage tracking

Level 1: Threading

Use Level 1 for I/O-bound operations that spend time waiting.

When to Use Level 1

  • File I/O operations (reading/writing multiple files)
  • Network requests (API calls, web scraping)
  • Database queries (concurrent database operations)
  • Any operation that releases the GIL

Example: Concurrent URL Fetching

import epochly
import requests
@epochly.optimize(level=1)
def fetch_urls(urls):
"""Fetch multiple URLs concurrently"""
results = []
for url in urls:
response = requests.get(url)
results.append(response.text)
return results
# Fetch multiple URLs in parallel
urls = [
'https://api.example.com/data1',
'https://api.example.com/data2',
'https://api.example.com/data3',
]
data = fetch_urls(urls)
print(f"Fetched {len(data)} URLs")

Example: Concurrent File Processing

import epochly
import json
@epochly.optimize(level=1)
def load_json_files(file_paths):
"""Load multiple JSON files concurrently"""
data = []
for path in file_paths:
with open(path, 'r') as f:
data.append(json.load(f))
return data
files = [f'data_{i}.json' for i in range(100)]
all_data = load_json_files(files)

Best Use Cases

  • Web scraping multiple pages
  • Batch file uploads/downloads
  • Concurrent database queries
  • API request batching

Requirements

  • Operations must release the Global Interpreter Lock (GIL)
  • Most I/O operations naturally release the GIL
  • Network and file operations are ideal candidates

Level 2: JIT Compilation

Use Level 2 for numerical loops and CPU-bound Python code.

When to Use Level 2

  • Numerical computations with loops
  • Custom algorithms without vectorization
  • Data transformations
  • Any CPU-bound pure Python code

Example: Numerical Loop Optimization

import epochly
import numpy as np
@epochly.optimize(level=2)
def monte_carlo_pi(n_samples):
"""Estimate Pi using Monte Carlo method with JIT"""
inside_circle = 0
for _ in range(n_samples):
x = np.random.random()
y = np.random.random()
if x**2 + y**2 <= 1:
inside_circle += 1
return 4 * inside_circle / n_samples
# JIT compiles on first call, fast on subsequent calls
pi_estimate = monte_carlo_pi(10_000_000)
print(f"Pi estimate: {pi_estimate:.6f}")

Example: Custom Algorithm

import epochly
@epochly.optimize(level=2)
def fibonacci_sequence(n):
"""Generate Fibonacci sequence with JIT"""
if n <= 1:
return n
a, b = 0, 1
for _ in range(n - 1):
a, b = b, a + b
return b
# First call compiles, subsequent calls are fast
result = fibonacci_sequence(1000)

Requirements

Python 3.9-3.12:

  • Requires Numba JIT compiler
  • Install: pip install numba

Python 3.13+:

  • Native JIT compilation built-in
  • No additional dependencies

Best Use Cases

  • Numerical simulations
  • Custom statistical functions
  • Data transformation loops
  • Algorithm implementations

Limitations

  • Compilation overhead on first call (warmup needed)
  • Not effective for already vectorized NumPy operations
  • Best for pure Python numerical code

Level 3: Multi-core Parallelism

Use Level 3 for CPU-bound operations that can be split across cores.

When to Use Level 3

  • List comprehensions with independent iterations
  • Batch processing
  • Parallel aggregations
  • Data partitioning operations

Example: Parallel List Comprehension

import epochly
import numpy as np
@epochly.optimize(level=3)
def parallel_processing(data_list):
"""Process list items in parallel across cores"""
results = [process_item(item) for item in data_list]
return results
def process_item(data):
"""CPU-intensive per-item processing"""
return np.sum(data ** 2 + np.sin(data) * np.cos(data))
# Generate data
data_list = [np.random.rand(100_000) for _ in range(100)]
# Process in parallel across all cores
results = parallel_processing(data_list)
print(f"Processed {len(results)} items")

Example: Parallel Aggregation

import epochly
import pandas as pd
import numpy as np
@epochly.optimize(level=3)
def parallel_groupby(df):
"""Parallel GroupBy aggregation"""
result = df.groupby('category').agg({
'value': ['sum', 'mean', 'std'],
'quantity': ['min', 'max', 'count']
})
return result
# Large DataFrame
df = pd.DataFrame({
'category': np.random.choice(['A', 'B', 'C'], 10_000_000),
'value': np.random.rand(10_000_000),
'quantity': np.random.randint(1, 100, 10_000_000)
})
result = parallel_groupby(df)

Requirements

  • Multiple CPU cores available
  • Partitionable data that can be split
  • No shared state between parallel tasks
  • Independent iterations (no dependencies between loop iterations)

Best Use Cases

  • Batch data processing
  • Parallel feature engineering
  • Independent model training
  • Parallel aggregations

Limitations

  • Overhead for small datasets
  • Not suitable for sequential algorithms
  • Memory overhead (copies per worker)

Level 4: GPU Acceleration

Use Level 4 for massive array operations that benefit from GPU parallelism.

When to Use Level 4

  • Large matrix operations (1M+ elements)
  • Neural network computations
  • Image/video processing
  • Scientific simulations with large arrays

Example: Large Array Operations

import epochly
import numpy as np
@epochly.optimize(level=4)
def gpu_matrix_operations(n):
"""Massive matrix operations on GPU"""
# Create large arrays
A = np.random.rand(n, n)
B = np.random.rand(n, n)
# Operations run on GPU
C = np.dot(A, B)
D = np.sin(C) + np.cos(C)
return np.sum(D)
# Process 10,000 x 10,000 matrices on GPU
result = gpu_matrix_operations(10_000)
print(f"Result: {result}")

Example: Batch Image Processing

import epochly
import numpy as np
@epochly.optimize(level=4)
def process_images(images):
"""Process batch of images on GPU"""
# Normalize
normalized = images / 255.0
# Apply transformations
transformed = normalized ** 2 + np.sin(normalized)
# Reduce
features = np.mean(transformed, axis=(1, 2))
return features
# Batch of 1000 images (256x256 RGB)
images = np.random.randint(0, 256, (1000, 256, 256, 3), dtype=np.uint8)
features = process_images(images)

Requirements

  • NVIDIA GPU with CUDA support
  • CUDA Toolkit installed
  • CuPy library: pip install cupy-cuda12x
  • Epochly Pro license (Level 4 requires Pro)

Best Use Cases

  • Deep learning inference
  • Scientific computing with large arrays
  • Image/video processing pipelines
  • Financial modeling with large datasets

Limitations

  • GPU memory constraints
  • CPU-GPU transfer overhead for small arrays
  • Requires Pro license
  • NVIDIA hardware only

Automatic Level Selection

Let Epochly choose the optimal level automatically:

import epochly
@epochly.optimize() # No level specified
def smart_function(data):
"""Epochly auto-selects the best level"""
return process(data)
# Epochly analyzes the workload and chooses:
# - Level 1 for I/O operations
# - Level 2 for numerical loops
# - Level 3 for parallel operations
# - Level 4 for large arrays (if GPU available)

How Automatic Selection Works

  1. Workload Analysis: Epochly profiles the function
  2. Pattern Detection: Identifies I/O, loops, or parallelizable operations
  3. Resource Check: Verifies available CPU cores, GPU
  4. Level Assignment: Selects optimal level based on analysis

Example: Auto-Selection

import epochly
import numpy as np
@epochly.optimize() # Auto-select
def auto_optimized(data):
"""Automatically optimized based on workload"""
# If data is large array -> Level 4
# If data has loops -> Level 2
# If data is I/O -> Level 1
return np.sum(data ** 2)
# Epochly automatically chooses Level 4 for large arrays
large_data = np.random.rand(10_000_000)
result = auto_optimized(large_data)

Comparing Levels

Benchmark all levels to find the best for your workload:

import epochly
import numpy as np
import time
def benchmark_all_levels(func, *args):
"""Compare all optimization levels"""
results = []
for level in [0, 1, 2, 3]:
@epochly.optimize(level=level)
def optimized_func(*args):
return func(*args)
# Warmup
optimized_func(*args)
# Measure
times = []
for _ in range(5):
start = time.perf_counter()
optimized_func(*args)
elapsed = time.perf_counter() - start
times.append(elapsed)
avg_time = np.mean(times)
results.append({
'level': level,
'time': avg_time,
'speedup': None # Calculate later
})
# Calculate speedups
baseline = results[0]['time']
for r in results:
r['speedup'] = baseline / r['time']
return results
# Define workload
def compute_workload(n):
arr = np.random.rand(n)
return np.sum(arr ** 2 + np.sin(arr))
# Benchmark
results = benchmark_all_levels(compute_workload, 5_000_000)
# Print results
print("\nLevel Comparison:")
print(f"{'Level':<8} {'Time (s)':<12} {'Speedup':<10}")
print("-" * 35)
for r in results:
print(f"{r['level']:<8} {r['time']:<12.4f} {r['speedup']:<10.2f}x")

Workload Analysis

Analyze your workload to get a level recommendation:

import epochly
import inspect
import numpy as np
def analyze_workload(func, sample_data):
"""Analyze workload and recommend best level"""
# Check function characteristics
source = inspect.getsource(func)
recommendations = []
# Check for I/O operations
io_keywords = ['open(', 'read(', 'write(', 'requests.', 'urllib']
if any(keyword in source for keyword in io_keywords):
recommendations.append({
'level': 1,
'reason': 'I/O operations detected',
'confidence': 'high'
})
# Check for loops
loop_keywords = ['for ', 'while ']
if any(keyword in source for keyword in loop_keywords):
recommendations.append({
'level': 2,
'reason': 'Loops detected (JIT will help)',
'confidence': 'medium'
})
# Check for list comprehensions
if '[' in source and 'for' in source:
recommendations.append({
'level': 3,
'reason': 'List comprehension (parallelizable)',
'confidence': 'medium'
})
# Check data size
if hasattr(sample_data, '__len__'):
size = len(sample_data)
if size > 1_000_000:
recommendations.append({
'level': 4,
'reason': f'Large data size ({size:,} elements)',
'confidence': 'high' if size > 10_000_000 else 'medium'
})
# Benchmark to verify
best_level = 0
best_time = float('inf')
for rec in recommendations:
level = rec['level']
with epochly.optimize_context(level=level):
# Warmup
func(sample_data)
# Measure
import time
start = time.perf_counter()
func(sample_data)
elapsed = time.perf_counter() - start
if elapsed < best_time:
best_time = elapsed
best_level = level
return {
'recommended_level': best_level,
'recommendations': recommendations,
'benchmark_time': best_time
}
# Example usage
def my_workload(data):
result = 0
for x in data:
result += x ** 2
return result
sample = np.random.rand(100_000)
analysis = analyze_workload(my_workload, sample)
print(f"Recommended Level: {analysis['recommended_level']}")
print("\nReasons:")
for rec in analysis['recommendations']:
print(f" Level {rec['level']}: {rec['reason']} (confidence: {rec['confidence']})")

Level Selection Best Practices

1. Start with Defaults

# Start with automatic selection
@epochly.optimize()
def my_function(data):
return process(data)
# Or use Level 2 as a safe default for CPU-bound work
@epochly.optimize(level=2)
def cpu_function(data):
return compute(data)

2. Benchmark Systematically

import epochly
# Test all applicable levels
levels_to_test = [0, 1, 2, 3]
for level in levels_to_test:
with epochly.benchmark_context(f"Level {level}"):
with epochly.optimize_context(level=level):
result = my_function(data)
# Choose the fastest

3. Consider Overhead

# Small data: Overhead may outweigh benefits
small_data = np.random.rand(100)
# Use Level 0 or no optimization for small data
@epochly.optimize(level=0)
def process_small(data):
return np.sum(data)
# Large data: Optimization pays off
large_data = np.random.rand(10_000_000)
# Use Level 3 for large parallel workloads
@epochly.optimize(level=3)
def process_large(data):
return [np.sum(chunk) for chunk in chunks(data)]

4. Match Data Size

Data SizeRecommended LevelReason
< 1,000Level 0Overhead too high
1,000 - 10,000Level 2JIT helps, low overhead
10,000 - 1,000,000Level 2 or 3Depends on parallelizability
> 1,000,000Level 3 or 4Multi-core or GPU benefits

5. Monitor in Production

import epochly
# Use Level 0 for monitoring without optimization
@epochly.optimize(level=0)
def production_function(data):
result = process(data)
# Log metrics
metrics = epochly.get_metrics()
log_to_monitoring(metrics)
return result

Common Mistakes

MistakeProblemSolution
Using Level 3 for small dataParallelization overhead exceeds benefitsUse Level 2 for data < 100K elements
Using Level 2 for I/O operationsJIT doesn't help waiting timeUse Level 1 for I/O-bound work
Using Level 4 without GPUFalls back to CPU, adds overheadCheck GPU availability first
Using Level 4 for small arraysCPU-GPU transfer overhead too highOnly use Level 4 for arrays > 1M elements
Not warming up JITFirst run is slow, skews benchmarksAlways run 2-3 warmup iterations
Mixing optimization levelsNested optimizations conflictUse one level per function
Optimizing already-optimized codeNumPy/pandas already optimizedFocus on pure Python loops
Ignoring memory constraintsLevel 3 creates copies, Level 4 limited by GPU RAMMonitor memory usage

Example: Avoiding Common Mistakes

import epochly
import numpy as np
# ❌ MISTAKE: Level 3 for small data
@epochly.optimize(level=3)
def bad_small_data():
data = [i ** 2 for i in range(100)] # Only 100 items!
return sum(data)
# ✅ CORRECT: No optimization for small data
def good_small_data():
data = [i ** 2 for i in range(100)]
return sum(data)
# ❌ MISTAKE: Level 2 for I/O
@epochly.optimize(level=2)
def bad_io_operation():
return [open(f).read() for f in file_paths]
# ✅ CORRECT: Level 1 for I/O
@epochly.optimize(level=1)
def good_io_operation():
return [open(f).read() for f in file_paths]
# ❌ MISTAKE: Level 4 without checking GPU
@epochly.optimize(level=4)
def bad_gpu_usage():
return np.sum(small_array)
# ✅ CORRECT: Check GPU and data size
@epochly.optimize(level=4)
def good_gpu_usage():
# Only use if GPU available and data is large
if epochly.is_gpu_available() and len(data) > 1_000_000:
return np.sum(data)
else:
raise ValueError("GPU not available or data too small")

Quick Reference

Level Selection Flowchart

┌─────────────────────────────────────┐
│ What type of workload? │
└────────────┬────────────────────────┘
┌────────┴────────┐
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│ I/O │ │ CPU │
│ bound? │ │ bound? │
└────┬────┘ └────┬────┘
│ │
▼ ▼
Level 1 ┌─────────┐
Threading │ Loops? │
└────┬────┘
┌────────────┴────────────┐
│ │
▼ ▼
┌─────────┐ ┌──────────┐
│ Yes │ │ No │
└────┬────┘ └────┬─────┘
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ Level 2 │ │ Parallel?│
│ JIT │ └────┬─────┘
└──────────┘ │
┌───────────┴────────┐
│ │
▼ ▼
┌──────────┐ ┌─────────┐
│ Yes │ │ Large │
└────┬─────┘ │ arrays? │
│ └────┬────┘
▼ │
┌──────────┐ ▼
│ Level 3 │ ┌──────────┐
│ Multicore│ │ Level 4 │
└──────────┘ │ GPU │
└──────────┘