Documentation

Enhancement Levels

Epochly uses a progressive enhancement model with five levels. Each level builds upon the previous, with increasing optimization capability.


Level Overview

LevelNameValueDescription
0LEVEL_0_MONITOR0Lightweight monitoring only
1LEVEL_1_THREADING1Thread pool optimizations
2LEVEL_2_JIT2JIT compilation enabled
3LEVEL_3_FULL3Full parallelism with sub-interpreters
4LEVEL_4_GPU4GPU acceleration

Level 0: Monitor

Purpose

Level 0 collects baseline performance metrics without applying any optimization. This serves as the starting point for all functions.

What It Does

  • Collects execution timing
  • Monitors memory usage
  • Identifies workload characteristics
  • No performance overhead

When to Use

  • Diagnosing performance issues
  • Establishing baseline metrics
  • Validating that code works correctly before optimization

Configuration

import epochly
# Set to monitoring only
epochly.set_level(0)
export EPOCHLY_LEVEL=0

Level 1: Threading

Purpose

Level 1 provides thread pool optimization for I/O-bound workloads. It enables concurrent execution of I/O operations.

What It Does

  • Creates a thread pool for parallel I/O
  • Enables concurrent file operations
  • Parallelizes network requests
  • Handles database queries concurrently

When Level 1 Helps

WorkloadBenefit
Multiple file reads2-10x speedup
Multiple API calls2-10x speedup
Database batch operations2-5x speedup
Mixed I/O and CPU1.5-3x speedup

When Level 1 Does Not Help

  • Pure CPU computation (use Level 2/3)
  • Single long-running I/O operation
  • Operations with strict sequential dependencies

Configuration

import epochly
# Set to threading level
epochly.set_level(1)
export EPOCHLY_LEVEL=1
export EPOCHLY_MAX_WORKERS=32

Level 2: JIT Compilation

Purpose

Level 2 applies Just-In-Time compilation to numerical code paths, converting Python loops to native machine code.

What It Does

  • Identifies hot code paths (frequently executed)
  • Compiles numerical operations to machine code
  • Caches compiled code for reuse
  • Provides 58–193x speedup for numerical loops

JIT Backends

BackendPython VersionsPlatformNotes
Numba3.9-3.12AllPrimary backend, installed automatically
Native JIT3.13+AllBuilt into Python 3.13
Pyston-Lite3.9-3.10Linux x86_64 onlyInstalled automatically where supported

When Level 2 Helps

Code PatternBenefit
Simple numerical loops50–100x speedup
Nested loops with math30–60x speedup
Polynomial/mathematical operations100–200x speedup
Iterative algorithms50–150x speedup

When Level 2 Does Not Help

  • String operations
  • Dictionary manipulation
  • Object-heavy code
  • Code with many Python C API calls

Configuration

export EPOCHLY_LEVEL=2
export EPOCHLY_JIT_HOT_PATH_THRESHOLD=1000
export EPOCHLY_JIT_BACKEND=auto

Level 3: Multicore Parallelism

Purpose

Level 3 enables true multicore execution, bypassing Python's Global Interpreter Lock (GIL).

What It Does

  • Distributes work across multiple CPU cores
  • Uses sub-interpreters (Python 3.12+) or ProcessPool
  • Provides near-linear scaling with core count
  • Enables shared memory for efficient data transfer

Parallelism Methods

Python VersionMethodNotes
3.12+Sub-interpretersTrue parallel Python, low overhead
3.9-3.11ProcessPoolProcess-based, higher overhead

When Level 3 Helps

WorkloadBenefit
CPU-bound parallel tasks2-8x per core
Data parallel operationsNear-linear scaling
Independent computationsExcellent scaling
Batch processingExcellent scaling

When Level 3 Does Not Help

  • Single-threaded algorithms
  • Operations with strict dependencies
  • Small workloads (overhead exceeds benefit)
  • Memory-bound operations

Configuration

export EPOCHLY_LEVEL=3
export EPOCHLY_MAX_WORKERS=8
export EPOCHLY_MEMORY_SHARED_SIZE=134217728 # 128MB

Level 4: GPU Acceleration

Purpose

Level 4 offloads suitable workloads to GPU for massive parallelism.

What It Does

  • Detects NVIDIA GPUs via CuPy
  • Automatically transfers data to GPU
  • Executes array operations on GPU
  • Returns results to CPU

Requirements

  • NVIDIA GPU with CUDA support
  • CuPy installed: pip install cupy-cuda12x
  • Appropriate license tier

When Level 4 Helps

OperationBenefit
Large matrix operations (>4096x4096)7–10x speedup
Elementwise ops on large arrays (10M+)66–70x speedup
Batched convolutions14–19x speedup
Large array reductions (100M+)22–36x speedup

When Level 4 Does Not Help

  • Small arrays (<10,000 elements)
  • Scalar operations
  • Operations dominated by CPU-GPU transfer
  • Code requiring frequent CPU-GPU synchronization

Configuration

export EPOCHLY_LEVEL=4
export EPOCHLY_GPU_ENABLED=true
export EPOCHLY_GPU_MEMORY_LIMIT=4096 # MB

Automatic Level Progression

Epochly automatically progresses through levels based on:

  1. Stability: No errors at current level
  2. Performance: Measurable improvement (>5% speedup)
  3. Compatibility: Required features available
  4. Resources: Sufficient memory/CPU available

Manual Level Control

Override automatic progression:

import epochly
# Force specific level
epochly.set_level(3)
# Check current level
current = epochly.get_level()