CPU-heavy web backends

Most Python web backends are I/O-bound. Request comes in, query the database, format the response, send it back. The CPU is idle most of the time.

But some endpoints are different. Report generation. Image processing. Recommendation scoring. Data aggregation across millions of rows. These are CPU-bound, and they create tail latency that degrades user experience.

Epochly accelerates the CPU-bound endpoints without touching the I/O-bound ones.

Where Web Backend Time Goes

Endpoint Type	Bottleneck	Epochly Benefit
CRUD (read/write)	Database I/O	~1.0x (I/O-bound)
Report generation	CPU computation	Level 2 JIT (58-193x on loops)
Search/ranking	Scoring algorithms	Level 2 JIT
Image processing	Pixel manipulation	Level 4 GPU (up to 70x)
Data aggregation	Custom Python loops	Level 2 JIT + Level 3 Parallel
Auth/middleware	Network + DB	~1.0x (I/O-bound)

Most endpoints won't benefit. The ones that will are the CPU-heavy outliers -- and those are usually the endpoints that create p99 latency spikes.

Accelerating Report Generation

Report endpoints often aggregate large datasets with custom business logic.

import epochly
from fastapi import FastAPI
app = FastAPI()
@epochly.optimize
def generate_sales_report(transactions, start_date, end_date):
    """Compute sales metrics with custom aggregation."""
    daily_totals = {}
    for txn in transactions:
        if start_date <= txn['date'] <= end_date:
            day = txn['date']
            if day not in daily_totals:
                daily_totals[day] = 0.0
            daily_totals[day] += txn['amount'] * txn['margin']
    # Compute rolling averages
    days = sorted(daily_totals.keys())
    results = []
    for i, day in enumerate(days):
        window = [daily_totals[days[j]] for j in range(max(0, i-6), i+1)]
        avg = sum(window) / len(window)
        results.append({'date': day, 'total': daily_totals[day], 'rolling_avg': avg})
    return results
@app.get("/api/reports/sales")
async def sales_report(start: str, end: str):
    transactions = await fetch_transactions(start, end)
    return generate_sales_report(transactions, start, end)

The loop iterating over transactions and computing rolling averages is CPU-bound Python. Level 2 JIT compiles the numerical operations: 58-193x speedup on the hot loop.

On our benchmark hardware, a report endpoint processing 100K transactions saw response time drop proportionally to the JIT speedup factor.

Accelerating Recommendation Scoring

Recommendation engines often score items against user profiles using custom algorithms.

import epochly
import numpy as np
@epochly.optimize
def score_recommendations(user_embedding, item_embeddings, item_features):
    """Score items for a user with custom ranking logic."""
    scores = np.zeros(len(item_embeddings))
    for i in range(len(item_embeddings)):
        # Cosine similarity
        dot = 0.0
        for j in range(len(user_embedding)):
            dot += user_embedding[j] * item_embeddings[i][j]
        # Feature bonus
        recency_bonus = item_features[i]['recency_score'] * 0.1
        popularity_bonus = item_features[i]['popularity'] * 0.05
        scores[i] = dot + recency_bonus + popularity_bonus
    return scores

Nested numerical loops are JIT targets. For large item catalogs with 10M+ items, GPU acceleration (Level 4) handles the matrix operations: up to 70x.

Accelerating Image Processing

User-uploaded image processing (thumbnails, filters, analysis) is CPU-bound.

import epochly
import numpy as np
@epochly.optimize
def process_image(image_array):
    """Apply custom image filter."""
    height, width, channels = image_array.shape
    output = np.empty_like(image_array)
    for y in range(1, height - 1):
        for x in range(1, width - 1):
            for c in range(channels):
                # 3x3 convolution kernel
                val = (
                    image_array[y-1, x-1, c] + image_array[y-1, x, c] + image_array[y-1, x+1, c] +
                    image_array[y, x-1, c] + image_array[y, x, c] * 4 + image_array[y, x+1, c] +
                    image_array[y+1, x-1, c] + image_array[y+1, x, c] + image_array[y+1, x+1, c]
                ) / 12
                output[y, x, c] = val
    return output

Triple-nested loops over pixel data are the worst case for Python interpretation and the best case for JIT. Level 2 provides 58-193x on this pattern.

For batch image processing, Level 3 parallel execution processes multiple images simultaneously: 8-12x on 16 cores.

Integration Pattern

Epochly integrates at the function level, not the framework level. Decorate the CPU-bound functions, leave the I/O-bound handlers alone.

# FastAPI example
from fastapi import FastAPI
import epochly
app = FastAPI()
# I/O-bound -- don't optimize
@app.get("/api/users/{user_id}")
async def get_user(user_id: int):
    return await db.fetch_user(user_id)
# CPU-bound -- optimize
@epochly.optimize
def compute_analytics(data):
    # Heavy computation here
    pass
@app.get("/api/analytics")
async def analytics():
    data = await db.fetch_data()
    return compute_analytics(data)  # This call is accelerated

What Epochly Does NOT Help in Web Backends

Database queries: I/O-bound. Use query optimization, indexing, caching. ~1.0x.
Network calls: External API calls are I/O-bound. ~1.0x.
Template rendering: Jinja2/Django templates are fast enough. ~1.0x.
Request routing/middleware: Framework overhead is minimal. ~1.0x.
Low-traffic endpoints: If an endpoint handles 10 req/s, the optimization might not justify the memory overhead.

Getting Started

pip install epochly
import epochly
@epochly.optimize
def your_heavy_endpoint_logic(data):
    # Your existing computation code
    pass

Profile your endpoints first. Identify which ones have high p99 latency from CPU computation. Apply @epochly.optimize to those specific functions.

Benchmark conditions: Python 3.12.3, Linux WSL2, 16 cores, NVIDIA Quadro M6000 24GB (CUDA 12.1). January 29, 2026 comprehensive benchmark report.