Level Selection Guide

Choose the right enhancement level for your workload to maximize performance gains.

Decision Matrix

Level	Name	Best For	Example Workload	Typical Speedup	Requirements
0	Monitoring	Baseline measurements, production monitoring	Any workload	1x (baseline)	None
1	Threading	I/O-bound operations	File reading, network requests, database queries	2-5x	GIL-releasing I/O operations
2	JIT Compilation	Numerical loops, CPU-bound Python code	Numerical computations, custom algorithms	58–193x	Python 3.9-3.12 (Numba) or 3.13+ (Native JIT)
3	Multi-core Parallelism	CPU-bound parallel workloads	List comprehensions, batch processing, aggregations	8–12x	Multiple CPU cores, partitionable data
4	GPU Acceleration	Massive array operations	Matrix operations, neural networks, image processing	7–70x	NVIDIA GPU, CUDA, CuPy, Pro license

Level 0: Monitoring

Use Level 0 for baseline measurements and production monitoring without optimization overhead.

When to Use Level 0

Establishing baseline performance metrics
Production environments where you want monitoring without optimization
Debugging performance issues
Comparing against optimized versions

Example: Performance Monitoring

import epochly
import numpy as np
@epochly.optimize(level=0)
def baseline_computation(n):
    """Track metrics without optimization"""
    arr = np.random.rand(n)
    result = np.sum(arr ** 2 + np.sin(arr))
    return result
# Run computation
result = baseline_computation(1_000_000)
# Get metrics
metrics = epochly.get_metrics()
print(f"Function calls: {metrics['total_calls']}")
print(f"Total time: {metrics['total_time']:.3f}s")
print(f"Average time: {metrics['avg_time']:.3f}s")

Best Use Cases

Production monitoring dashboards
A/B testing baselines
Performance regression detection
Resource usage tracking

Level 1: Threading

Use Level 1 for I/O-bound operations that spend time waiting.

When to Use Level 1

File I/O operations (reading/writing multiple files)
Network requests (API calls, web scraping)
Database queries (concurrent database operations)
Any operation that releases the GIL

Example: Concurrent URL Fetching

import epochly
import requests
@epochly.optimize(level=1)
def fetch_urls(urls):
    """Fetch multiple URLs concurrently"""
    results = []
    for url in urls:
        response = requests.get(url)
        results.append(response.text)
    return results
# Fetch multiple URLs in parallel
urls = [
    'https://api.example.com/data1',
    'https://api.example.com/data2',
    'https://api.example.com/data3',
]
data = fetch_urls(urls)
print(f"Fetched {len(data)} URLs")

Example: Concurrent File Processing

import epochly
import json
@epochly.optimize(level=1)
def load_json_files(file_paths):
    """Load multiple JSON files concurrently"""
    data = []
    for path in file_paths:
        with open(path, 'r') as f:
            data.append(json.load(f))
    return data
files = [f'data_{i}.json' for i in range(100)]
all_data = load_json_files(files)

Best Use Cases

Web scraping multiple pages
Batch file uploads/downloads
Concurrent database queries
API request batching

Requirements

Operations must release the Global Interpreter Lock (GIL)
Most I/O operations naturally release the GIL
Network and file operations are ideal candidates

Level 2: JIT Compilation

Use Level 2 for numerical loops and CPU-bound Python code.

When to Use Level 2

Numerical computations with loops
Custom algorithms without vectorization
Data transformations
Any CPU-bound pure Python code

Example: Numerical Loop Optimization

import epochly
import numpy as np
@epochly.optimize(level=2)
def monte_carlo_pi(n_samples):
    """Estimate Pi using Monte Carlo method with JIT"""
    inside_circle = 0
    
    for _ in range(n_samples):
        x = np.random.random()
        y = np.random.random()
        
        if x**2 + y**2 <= 1:
            inside_circle += 1
    
    return 4 * inside_circle / n_samples
# JIT compiles on first call, fast on subsequent calls
pi_estimate = monte_carlo_pi(10_000_000)
print(f"Pi estimate: {pi_estimate:.6f}")

Example: Custom Algorithm

import epochly
@epochly.optimize(level=2)
def fibonacci_sequence(n):
    """Generate Fibonacci sequence with JIT"""
    if n <= 1:
        return n
    
    a, b = 0, 1
    for _ in range(n - 1):
        a, b = b, a + b
    
    return b
# First call compiles, subsequent calls are fast
result = fibonacci_sequence(1000)

Requirements

Python 3.9-3.12:

Requires Numba JIT compiler
Install: pip install numba

Python 3.13+:

Native JIT compilation built-in
No additional dependencies

Best Use Cases

Numerical simulations
Custom statistical functions
Data transformation loops
Algorithm implementations

Limitations

Compilation overhead on first call (warmup needed)
Not effective for already vectorized NumPy operations
Best for pure Python numerical code

Level 3: Multi-core Parallelism

Use Level 3 for CPU-bound operations that can be split across cores.

When to Use Level 3

List comprehensions with independent iterations
Batch processing
Parallel aggregations
Data partitioning operations

Example: Parallel List Comprehension

import epochly
import numpy as np
@epochly.optimize(level=3)
def parallel_processing(data_list):
    """Process list items in parallel across cores"""
    results = [process_item(item) for item in data_list]
    return results
def process_item(data):
    """CPU-intensive per-item processing"""
    return np.sum(data ** 2 + np.sin(data) * np.cos(data))
# Generate data
data_list = [np.random.rand(100_000) for _ in range(100)]
# Process in parallel across all cores
results = parallel_processing(data_list)
print(f"Processed {len(results)} items")

Example: Parallel Aggregation

import epochly
import pandas as pd
import numpy as np
@epochly.optimize(level=3)
def parallel_groupby(df):
    """Parallel GroupBy aggregation"""
    result = df.groupby('category').agg({
        'value': ['sum', 'mean', 'std'],
        'quantity': ['min', 'max', 'count']
    })
    return result
# Large DataFrame
df = pd.DataFrame({
    'category': np.random.choice(['A', 'B', 'C'], 10_000_000),
    'value': np.random.rand(10_000_000),
    'quantity': np.random.randint(1, 100, 10_000_000)
})
result = parallel_groupby(df)

Requirements

Multiple CPU cores available
Partitionable data that can be split
No shared state between parallel tasks
Independent iterations (no dependencies between loop iterations)

Best Use Cases

Batch data processing
Parallel feature engineering
Independent model training
Parallel aggregations

Limitations

Overhead for small datasets
Not suitable for sequential algorithms
Memory overhead (copies per worker)

Level 4: GPU Acceleration

Use Level 4 for massive array operations that benefit from GPU parallelism.

When to Use Level 4

Large matrix operations (1M+ elements)
Neural network computations
Image/video processing
Scientific simulations with large arrays

Example: Large Array Operations

import epochly
import numpy as np
@epochly.optimize(level=4)
def gpu_matrix_operations(n):
    """Massive matrix operations on GPU"""
    # Create large arrays
    A = np.random.rand(n, n)
    B = np.random.rand(n, n)
    
    # Operations run on GPU
    C = np.dot(A, B)
    D = np.sin(C) + np.cos(C)
    
    return np.sum(D)
# Process 10,000 x 10,000 matrices on GPU
result = gpu_matrix_operations(10_000)
print(f"Result: {result}")

Example: Batch Image Processing

import epochly
import numpy as np
@epochly.optimize(level=4)
def process_images(images):
    """Process batch of images on GPU"""
    # Normalize
    normalized = images / 255.0
    
    # Apply transformations
    transformed = normalized ** 2 + np.sin(normalized)
    
    # Reduce
    features = np.mean(transformed, axis=(1, 2))
    
    return features
# Batch of 1000 images (256x256 RGB)
images = np.random.randint(0, 256, (1000, 256, 256, 3), dtype=np.uint8)
features = process_images(images)

Requirements

NVIDIA GPU with CUDA support
CUDA Toolkit installed
CuPy library: pip install cupy-cuda12x
Epochly Pro license (Level 4 requires Pro)

Best Use Cases

Deep learning inference
Scientific computing with large arrays
Image/video processing pipelines
Financial modeling with large datasets

Limitations

GPU memory constraints
CPU-GPU transfer overhead for small arrays
Requires Pro license
NVIDIA hardware only

Automatic Level Selection

Let Epochly choose the optimal level automatically:

import epochly
@epochly.optimize()  # No level specified
def smart_function(data):
    """Epochly auto-selects the best level"""
    return process(data)
# Epochly analyzes the workload and chooses:
# - Level 1 for I/O operations
# - Level 2 for numerical loops
# - Level 3 for parallel operations
# - Level 4 for large arrays (if GPU available)

How Automatic Selection Works

Workload Analysis: Epochly profiles the function
Pattern Detection: Identifies I/O, loops, or parallelizable operations
Resource Check: Verifies available CPU cores, GPU
Level Assignment: Selects optimal level based on analysis

Example: Auto-Selection

import epochly
import numpy as np
@epochly.optimize()  # Auto-select
def auto_optimized(data):
    """Automatically optimized based on workload"""
    # If data is large array -> Level 4
    # If data has loops -> Level 2
    # If data is I/O -> Level 1
    return np.sum(data ** 2)
# Epochly automatically chooses Level 4 for large arrays
large_data = np.random.rand(10_000_000)
result = auto_optimized(large_data)

Comparing Levels

Benchmark all levels to find the best for your workload:

import epochly
import numpy as np
import time
def benchmark_all_levels(func, *args):
    """Compare all optimization levels"""
    results = []
    
    for level in [0, 1, 2, 3]:
        @epochly.optimize(level=level)
        def optimized_func(*args):
            return func(*args)
        
        # Warmup
        optimized_func(*args)
        
        # Measure
        times = []
        for _ in range(5):
            start = time.perf_counter()
            optimized_func(*args)
            elapsed = time.perf_counter() - start
            times.append(elapsed)
        
        avg_time = np.mean(times)
        results.append({
            'level': level,
            'time': avg_time,
            'speedup': None  # Calculate later
        })
    
    # Calculate speedups
    baseline = results[0]['time']
    for r in results:
        r['speedup'] = baseline / r['time']
    
    return results
# Define workload
def compute_workload(n):
    arr = np.random.rand(n)
    return np.sum(arr ** 2 + np.sin(arr))
# Benchmark
results = benchmark_all_levels(compute_workload, 5_000_000)
# Print results
print("\nLevel Comparison:")
print(f"{'Level':<8} {'Time (s)':<12} {'Speedup':<10}")
print("-" * 35)
for r in results:
    print(f"{r['level']:<8} {r['time']:<12.4f} {r['speedup']:<10.2f}x")

Workload Analysis

Analyze your workload to get a level recommendation:

import epochly
import inspect
import numpy as np
def analyze_workload(func, sample_data):
    """Analyze workload and recommend best level"""
    
    # Check function characteristics
    source = inspect.getsource(func)
    
    recommendations = []
    
    # Check for I/O operations
    io_keywords = ['open(', 'read(', 'write(', 'requests.', 'urllib']
    if any(keyword in source for keyword in io_keywords):
        recommendations.append({
            'level': 1,
            'reason': 'I/O operations detected',
            'confidence': 'high'
        })
    
    # Check for loops
    loop_keywords = ['for ', 'while ']
    if any(keyword in source for keyword in loop_keywords):
        recommendations.append({
            'level': 2,
            'reason': 'Loops detected (JIT will help)',
            'confidence': 'medium'
        })
    
    # Check for list comprehensions
    if '[' in source and 'for' in source:
        recommendations.append({
            'level': 3,
            'reason': 'List comprehension (parallelizable)',
            'confidence': 'medium'
        })
    
    # Check data size
    if hasattr(sample_data, '__len__'):
        size = len(sample_data)
        if size > 1_000_000:
            recommendations.append({
                'level': 4,
                'reason': f'Large data size ({size:,} elements)',
                'confidence': 'high' if size > 10_000_000 else 'medium'
            })
    
    # Benchmark to verify
    best_level = 0
    best_time = float('inf')
    
    for rec in recommendations:
        level = rec['level']
        
        with epochly.optimize_context(level=level):
            # Warmup
            func(sample_data)
            
            # Measure
            import time
            start = time.perf_counter()
            func(sample_data)
            elapsed = time.perf_counter() - start
            
            if elapsed < best_time:
                best_time = elapsed
                best_level = level
    
    return {
        'recommended_level': best_level,
        'recommendations': recommendations,
        'benchmark_time': best_time
    }
# Example usage
def my_workload(data):
    result = 0
    for x in data:
        result += x ** 2
    return result
sample = np.random.rand(100_000)
analysis = analyze_workload(my_workload, sample)
print(f"Recommended Level: {analysis['recommended_level']}")
print("\nReasons:")
for rec in analysis['recommendations']:
    print(f"  Level {rec['level']}: {rec['reason']} (confidence: {rec['confidence']})")

Level Selection Best Practices

1. Start with Defaults

# Start with automatic selection
@epochly.optimize()
def my_function(data):
    return process(data)
# Or use Level 2 as a safe default for CPU-bound work
@epochly.optimize(level=2)
def cpu_function(data):
    return compute(data)

2. Benchmark Systematically

import epochly
# Test all applicable levels
levels_to_test = [0, 1, 2, 3]
for level in levels_to_test:
    with epochly.benchmark_context(f"Level {level}"):
        with epochly.optimize_context(level=level):
            result = my_function(data)
# Choose the fastest

3. Consider Overhead

# Small data: Overhead may outweigh benefits
small_data = np.random.rand(100)
# Use Level 0 or no optimization for small data
@epochly.optimize(level=0)
def process_small(data):
    return np.sum(data)
# Large data: Optimization pays off
large_data = np.random.rand(10_000_000)
# Use Level 3 for large parallel workloads
@epochly.optimize(level=3)
def process_large(data):
    return [np.sum(chunk) for chunk in chunks(data)]

4. Match Data Size

Data Size	Recommended Level	Reason
< 1,000	Level 0	Overhead too high
1,000 - 10,000	Level 2	JIT helps, low overhead
10,000 - 1,000,000	Level 2 or 3	Depends on parallelizability
> 1,000,000	Level 3 or 4	Multi-core or GPU benefits

5. Monitor in Production

import epochly
# Use Level 0 for monitoring without optimization
@epochly.optimize(level=0)
def production_function(data):
    result = process(data)
    
    # Log metrics
    metrics = epochly.get_metrics()
    log_to_monitoring(metrics)
    
    return result

Common Mistakes

Mistake	Problem	Solution
Using Level 3 for small data	Parallelization overhead exceeds benefits	Use Level 2 for data < 100K elements
Using Level 2 for I/O operations	JIT doesn't help waiting time	Use Level 1 for I/O-bound work
Using Level 4 without GPU	Falls back to CPU, adds overhead	Check GPU availability first
Using Level 4 for small arrays	CPU-GPU transfer overhead too high	Only use Level 4 for arrays > 1M elements
Not warming up JIT	First run is slow, skews benchmarks	Always run 2-3 warmup iterations
Mixing optimization levels	Nested optimizations conflict	Use one level per function
Optimizing already-optimized code	NumPy/pandas already optimized	Focus on pure Python loops
Ignoring memory constraints	Level 3 creates copies, Level 4 limited by GPU RAM	Monitor memory usage

Example: Avoiding Common Mistakes

import epochly
import numpy as np
# ❌ MISTAKE: Level 3 for small data
@epochly.optimize(level=3)
def bad_small_data():
    data = [i ** 2 for i in range(100)]  # Only 100 items!
    return sum(data)
# ✅ CORRECT: No optimization for small data
def good_small_data():
    data = [i ** 2 for i in range(100)]
    return sum(data)
# ❌ MISTAKE: Level 2 for I/O
@epochly.optimize(level=2)
def bad_io_operation():
    return [open(f).read() for f in file_paths]
# ✅ CORRECT: Level 1 for I/O
@epochly.optimize(level=1)
def good_io_operation():
    return [open(f).read() for f in file_paths]
# ❌ MISTAKE: Level 4 without checking GPU
@epochly.optimize(level=4)
def bad_gpu_usage():
    return np.sum(small_array)
# ✅ CORRECT: Check GPU and data size
@epochly.optimize(level=4)
def good_gpu_usage():
    # Only use if GPU available and data is large
    if epochly.has_gpu() and len(data) > 1_000_000:
        return np.sum(data)
    else:
        raise ValueError("GPU not available or data too small")

Quick Reference

Level Selection Flowchart

┌─────────────────────────────────────┐
│ What type of workload?              │
└────────────┬────────────────────────┘
             │
    ┌────────┴────────┐
    │                 │
    ▼                 ▼
┌─────────┐      ┌─────────┐
│ I/O     │      │ CPU     │
│ bound?  │      │ bound?  │
└────┬────┘      └────┬────┘
     │                │
     ▼                ▼
 Level 1         ┌─────────┐
 Threading       │ Loops?  │
                 └────┬────┘
                      │
         ┌────────────┴────────────┐
         │                         │
         ▼                         ▼
    ┌─────────┐             ┌──────────┐
    │ Yes     │             │ No       │
    └────┬────┘             └────┬─────┘
         │                       │
         ▼                       ▼
   ┌──────────┐           ┌──────────┐
   │ Level 2  │           │ Parallel?│
   │ JIT      │           └────┬─────┘
   └──────────┘                │
                    ┌───────────┴────────┐
                    │                    │
                    ▼                    ▼
              ┌──────────┐         ┌─────────┐
              │ Yes      │         │ Large   │
              └────┬─────┘         │ arrays? │
                   │               └────┬────┘
                   ▼                    │
             ┌──────────┐               ▼
             │ Level 3  │         ┌──────────┐
             │ Multicore│         │ Level 4  │
             └──────────┘         │ GPU      │
                                  └──────────┘

Level Selection Guide

Decision Matrix

Level 0: Monitoring

When to Use Level 0

Example: Performance Monitoring

Best Use Cases

Level 1: Threading

When to Use Level 1

Example: Concurrent URL Fetching

Example: Concurrent File Processing

Best Use Cases

Requirements

Level 2: JIT Compilation

When to Use Level 2

Example: Numerical Loop Optimization

Example: Custom Algorithm

Requirements

Best Use Cases

Limitations

Level 3: Multi-core Parallelism

When to Use Level 3

Example: Parallel List Comprehension

Example: Parallel Aggregation

Requirements

Best Use Cases

Limitations

Level 4: GPU Acceleration

When to Use Level 4

Example: Large Array Operations

Example: Batch Image Processing

Requirements

Best Use Cases

Limitations

Automatic Level Selection

How Automatic Selection Works

Example: Auto-Selection

Comparing Levels

Workload Analysis

Level Selection Best Practices

1. Start with Defaults

2. Benchmark Systematically

3. Consider Overhead

4. Match Data Size

5. Monitor in Production

Common Mistakes

Example: Avoiding Common Mistakes

Quick Reference

Level Selection Flowchart

Related Documentation