All use cases

CPU-heavy web backends

FastAPI, Django, and Flask workloads where CPU time is the real problem.

Most Python web backends are I/O-bound. Request comes in, query the database, format the response, send it back. The CPU is idle most of the time.

But some endpoints are different. Report generation. Image processing. Recommendation scoring. Data aggregation across millions of rows. These are CPU-bound, and they create tail latency that degrades user experience.

Epochly accelerates the CPU-bound endpoints without touching the I/O-bound ones.


Where Web Backend Time Goes

Endpoint TypeBottleneckEpochly Benefit
CRUD (read/write)Database I/O~1.0x (I/O-bound)
Report generationCPU computationLevel 2 JIT (58-193x on loops)
Search/rankingScoring algorithmsLevel 2 JIT
Image processingPixel manipulationLevel 4 GPU (up to 70x)
Data aggregationCustom Python loopsLevel 2 JIT + Level 3 Parallel
Auth/middlewareNetwork + DB~1.0x (I/O-bound)

Most endpoints won't benefit. The ones that will are the CPU-heavy outliers -- and those are usually the endpoints that create p99 latency spikes.


Accelerating Report Generation

Report endpoints often aggregate large datasets with custom business logic.

import epochly
from fastapi import FastAPI
app = FastAPI()
@epochly.optimize
def generate_sales_report(transactions, start_date, end_date):
"""Compute sales metrics with custom aggregation."""
daily_totals = {}
for txn in transactions:
if start_date <= txn['date'] <= end_date:
day = txn['date']
if day not in daily_totals:
daily_totals[day] = 0.0
daily_totals[day] += txn['amount'] * txn['margin']
# Compute rolling averages
days = sorted(daily_totals.keys())
results = []
for i, day in enumerate(days):
window = [daily_totals[days[j]] for j in range(max(0, i-6), i+1)]
avg = sum(window) / len(window)
results.append({'date': day, 'total': daily_totals[day], 'rolling_avg': avg})
return results
@app.get("/api/reports/sales")
async def sales_report(start: str, end: str):
transactions = await fetch_transactions(start, end)
return generate_sales_report(transactions, start, end)

The loop iterating over transactions and computing rolling averages is CPU-bound Python. Level 2 JIT compiles the numerical operations: 58-193x speedup on the hot loop.

On our benchmark hardware, a report endpoint processing 100K transactions saw response time drop proportionally to the JIT speedup factor.


Accelerating Recommendation Scoring

Recommendation engines often score items against user profiles using custom algorithms.

import epochly
import numpy as np
@epochly.optimize
def score_recommendations(user_embedding, item_embeddings, item_features):
"""Score items for a user with custom ranking logic."""
scores = np.zeros(len(item_embeddings))
for i in range(len(item_embeddings)):
# Cosine similarity
dot = 0.0
for j in range(len(user_embedding)):
dot += user_embedding[j] * item_embeddings[i][j]
# Feature bonus
recency_bonus = item_features[i]['recency_score'] * 0.1
popularity_bonus = item_features[i]['popularity'] * 0.05
scores[i] = dot + recency_bonus + popularity_bonus
return scores

Nested numerical loops are JIT targets. For large item catalogs with 10M+ items, GPU acceleration (Level 4) handles the matrix operations: up to 70x.


Accelerating Image Processing

User-uploaded image processing (thumbnails, filters, analysis) is CPU-bound.

import epochly
import numpy as np
@epochly.optimize
def process_image(image_array):
"""Apply custom image filter."""
height, width, channels = image_array.shape
output = np.empty_like(image_array)
for y in range(1, height - 1):
for x in range(1, width - 1):
for c in range(channels):
# 3x3 convolution kernel
val = (
image_array[y-1, x-1, c] + image_array[y-1, x, c] + image_array[y-1, x+1, c] +
image_array[y, x-1, c] + image_array[y, x, c] * 4 + image_array[y, x+1, c] +
image_array[y+1, x-1, c] + image_array[y+1, x, c] + image_array[y+1, x+1, c]
) / 12
output[y, x, c] = val
return output

Triple-nested loops over pixel data are the worst case for Python interpretation and the best case for JIT. Level 2 provides 58-193x on this pattern.

For batch image processing, Level 3 parallel execution processes multiple images simultaneously: 8-12x on 16 cores.


Integration Pattern

Epochly integrates at the function level, not the framework level. Decorate the CPU-bound functions, leave the I/O-bound handlers alone.

# FastAPI example
from fastapi import FastAPI
import epochly
app = FastAPI()
# I/O-bound -- don't optimize
@app.get("/api/users/{user_id}")
async def get_user(user_id: int):
return await db.fetch_user(user_id)
# CPU-bound -- optimize
@epochly.optimize
def compute_analytics(data):
# Heavy computation here
pass
@app.get("/api/analytics")
async def analytics():
data = await db.fetch_data()
return compute_analytics(data) # This call is accelerated

What Epochly Does NOT Help in Web Backends

  • Database queries: I/O-bound. Use query optimization, indexing, caching. ~1.0x.
  • Network calls: External API calls are I/O-bound. ~1.0x.
  • Template rendering: Jinja2/Django templates are fast enough. ~1.0x.
  • Request routing/middleware: Framework overhead is minimal. ~1.0x.
  • Low-traffic endpoints: If an endpoint handles 10 req/s, the optimization might not justify the memory overhead.

Getting Started

pip install epochly
import epochly
@epochly.optimize
def your_heavy_endpoint_logic(data):
# Your existing computation code
pass

Profile your endpoints first. Identify which ones have high p99 latency from CPU computation. Apply @epochly.optimize to those specific functions.


Benchmark conditions: Python 3.12.3, Linux WSL2, 16 cores, NVIDIA Quadro M6000 24GB (CUDA 12.1). January 29, 2026 comprehensive benchmark report.

pythonwebfastapidjangoflaskbackendapi