Enhancement Levels

Epochly uses a progressive enhancement model with five levels. Each level builds upon the previous, with increasing optimization capability.

Level Overview

Level	Name	Value	Description
0	LEVEL_0_MONITOR	0	Lightweight monitoring only
1	LEVEL_1_THREADING	1	Thread pool optimizations
2	LEVEL_2_JIT	2	JIT compilation enabled
3	LEVEL_3_FULL	3	Full parallelism with sub-interpreters
4	LEVEL_4_GPU	4	GPU acceleration

Level 0: Monitor

Purpose

Level 0 collects baseline performance metrics without applying any optimization. This serves as the starting point for all functions.

What It Does

Collects execution timing
Monitors memory usage
Identifies workload characteristics
No performance overhead

When to Use

Diagnosing performance issues
Establishing baseline metrics
Validating that code works correctly before optimization

Configuration

import epochly
# Set to monitoring only
epochly.set_level(0)

export EPOCHLY_LEVEL=0

Level 1: Threading

Purpose

Level 1 provides thread pool optimization for I/O-bound workloads. It enables concurrent execution of I/O operations.

What It Does

Creates a thread pool for parallel I/O
Enables concurrent file operations
Parallelizes network requests
Handles database queries concurrently

When Level 1 Helps

Workload	Benefit
Multiple file reads	2-10x speedup
Multiple API calls	2-10x speedup
Database batch operations	2-5x speedup
Mixed I/O and CPU	1.5-3x speedup

When Level 1 Does Not Help

Pure CPU computation (use Level 2/3)
Single long-running I/O operation
Operations with strict sequential dependencies

Configuration

import epochly
# Set to threading level
epochly.set_level(1)

export EPOCHLY_LEVEL=1
export EPOCHLY_MAX_WORKERS=32

Level 2: JIT Compilation

Purpose

Level 2 applies Just-In-Time compilation to numerical code paths, converting Python loops to native machine code.

What It Does

Identifies hot code paths (frequently executed)
Compiles numerical operations to machine code
Caches compiled code for reuse
Provides 58–193x speedup for numerical loops

JIT Backends

Backend	Python Versions	Platform	Notes
Numba	3.9-3.12	All	Primary backend, installed automatically
Native JIT	3.13+	All	Built into Python 3.13
Pyston-Lite	3.9-3.10	Linux x86_64 only	Installed automatically where supported

When Level 2 Helps

Code Pattern	Benefit
Simple numerical loops	50–100x speedup
Nested loops with math	30–60x speedup
Polynomial/mathematical operations	100–200x speedup
Iterative algorithms	50–150x speedup

When Level 2 Does Not Help

String operations
Dictionary manipulation
Object-heavy code
Code with many Python C API calls

Configuration

export EPOCHLY_LEVEL=2
export EPOCHLY_JIT_HOT_PATH_THRESHOLD=1000
export EPOCHLY_JIT_BACKEND=auto

Level 3: Multicore Parallelism

Purpose

Level 3 enables true multicore execution, bypassing Python's Global Interpreter Lock (GIL).

What It Does

Distributes work across multiple CPU cores
Uses sub-interpreters (Python 3.12+) or ProcessPool
Provides near-linear scaling with core count
Enables shared memory for efficient data transfer

Parallelism Methods

Python Version	Method	Notes
3.12+	Sub-interpreters	True parallel Python, low overhead
3.9-3.11	ProcessPool	Process-based, higher overhead

When Level 3 Helps

Workload	Benefit
CPU-bound parallel tasks	2-8x per core
Data parallel operations	Near-linear scaling
Independent computations	Excellent scaling
Batch processing	Excellent scaling

When Level 3 Does Not Help

Single-threaded algorithms
Operations with strict dependencies
Small workloads (overhead exceeds benefit)
Memory-bound operations

Configuration

export EPOCHLY_LEVEL=3
export EPOCHLY_MAX_WORKERS=8
export EPOCHLY_MEMORY_SHARED_SIZE=134217728  # 128MB

Level 4: GPU Acceleration

Purpose

Level 4 offloads suitable workloads to GPU for massive parallelism.

What It Does

Detects NVIDIA GPUs via CuPy
Automatically transfers data to GPU
Executes array operations on GPU
Returns results to CPU

Requirements

NVIDIA GPU with CUDA support
CuPy installed: pip install cupy-cuda12x
Appropriate license tier

When Level 4 Helps

Operation	Benefit
Large matrix operations (>4096x4096)	7–10x speedup
Elementwise ops on large arrays (10M+)	66–70x speedup
Batched convolutions	14–19x speedup
Large array reductions (100M+)	22–36x speedup

When Level 4 Does Not Help

Small arrays (<10,000 elements)
Scalar operations
Operations dominated by CPU-GPU transfer
Code requiring frequent CPU-GPU synchronization

Configuration

export EPOCHLY_LEVEL=4
export EPOCHLY_GPU_ENABLED=true
export EPOCHLY_GPU_MEMORY_LIMIT=4096  # MB

Automatic Level Progression

Epochly automatically progresses through levels based on:

Stability: No errors at current level
Performance: Measurable improvement (>5% speedup)
Compatibility: Required features available
Resources: Sufficient memory/CPU available

Manual Level Control

Override automatic progression:

import epochly
# Force specific level
epochly.set_level(3)
# Check current level
current = epochly.get_level()

Enhancement Levels

Level Overview

Level 0: Monitor

Purpose

What It Does

When to Use

Configuration

Level 1: Threading

Purpose

What It Does

When Level 1 Helps

When Level 1 Does Not Help

Configuration

Level 2: JIT Compilation

Purpose

What It Does

JIT Backends

When Level 2 Helps

When Level 2 Does Not Help

Configuration

Level 3: Multicore Parallelism

Purpose

What It Does

Parallelism Methods

When Level 3 Helps

When Level 3 Does Not Help

Configuration

Level 4: GPU Acceleration

Purpose

What It Does

Requirements

When Level 4 Helps

When Level 4 Does Not Help

Configuration

Automatic Level Progression

Manual Level Control

Related Documentation