Most Python web backends are I/O-bound. Request comes in, query the database, format the response, send it back. The CPU is idle most of the time.
But some endpoints are different. Report generation. Image processing. Recommendation scoring. Data aggregation across millions of rows. These are CPU-bound, and they create tail latency that degrades user experience.
Epochly accelerates the CPU-bound endpoints without touching the I/O-bound ones.
Where Web Backend Time Goes
| Endpoint Type | Bottleneck | Epochly Benefit |
|---|---|---|
| CRUD (read/write) | Database I/O | ~1.0x (I/O-bound) |
| Report generation | CPU computation | Level 2 JIT (58-193x on loops) |
| Search/ranking | Scoring algorithms | Level 2 JIT |
| Image processing | Pixel manipulation | Level 4 GPU (up to 70x) |
| Data aggregation | Custom Python loops | Level 2 JIT + Level 3 Parallel |
| Auth/middleware | Network + DB | ~1.0x (I/O-bound) |
Most endpoints won't benefit. The ones that will are the CPU-heavy outliers -- and those are usually the endpoints that create p99 latency spikes.
Accelerating Report Generation
Report endpoints often aggregate large datasets with custom business logic.
import epochlyfrom fastapi import FastAPIapp = FastAPI()@epochly.optimizedef generate_sales_report(transactions, start_date, end_date):"""Compute sales metrics with custom aggregation."""daily_totals = {}for txn in transactions:if start_date <= txn['date'] <= end_date:day = txn['date']if day not in daily_totals:daily_totals[day] = 0.0daily_totals[day] += txn['amount'] * txn['margin']# Compute rolling averagesdays = sorted(daily_totals.keys())results = []for i, day in enumerate(days):window = [daily_totals[days[j]] for j in range(max(0, i-6), i+1)]avg = sum(window) / len(window)results.append({'date': day, 'total': daily_totals[day], 'rolling_avg': avg})return results@app.get("/api/reports/sales")async def sales_report(start: str, end: str):transactions = await fetch_transactions(start, end)return generate_sales_report(transactions, start, end)
The loop iterating over transactions and computing rolling averages is CPU-bound Python. Level 2 JIT compiles the numerical operations: 58-193x speedup on the hot loop.
On our benchmark hardware, a report endpoint processing 100K transactions saw response time drop proportionally to the JIT speedup factor.
Accelerating Recommendation Scoring
Recommendation engines often score items against user profiles using custom algorithms.
import epochlyimport numpy as np@epochly.optimizedef score_recommendations(user_embedding, item_embeddings, item_features):"""Score items for a user with custom ranking logic."""scores = np.zeros(len(item_embeddings))for i in range(len(item_embeddings)):# Cosine similaritydot = 0.0for j in range(len(user_embedding)):dot += user_embedding[j] * item_embeddings[i][j]# Feature bonusrecency_bonus = item_features[i]['recency_score'] * 0.1popularity_bonus = item_features[i]['popularity'] * 0.05scores[i] = dot + recency_bonus + popularity_bonusreturn scores
Nested numerical loops are JIT targets. For large item catalogs with 10M+ items, GPU acceleration (Level 4) handles the matrix operations: up to 70x.
Accelerating Image Processing
User-uploaded image processing (thumbnails, filters, analysis) is CPU-bound.
import epochlyimport numpy as np@epochly.optimizedef process_image(image_array):"""Apply custom image filter."""height, width, channels = image_array.shapeoutput = np.empty_like(image_array)for y in range(1, height - 1):for x in range(1, width - 1):for c in range(channels):# 3x3 convolution kernelval = (image_array[y-1, x-1, c] + image_array[y-1, x, c] + image_array[y-1, x+1, c] +image_array[y, x-1, c] + image_array[y, x, c] * 4 + image_array[y, x+1, c] +image_array[y+1, x-1, c] + image_array[y+1, x, c] + image_array[y+1, x+1, c]) / 12output[y, x, c] = valreturn output
Triple-nested loops over pixel data are the worst case for Python interpretation and the best case for JIT. Level 2 provides 58-193x on this pattern.
For batch image processing, Level 3 parallel execution processes multiple images simultaneously: 8-12x on 16 cores.
Integration Pattern
Epochly integrates at the function level, not the framework level. Decorate the CPU-bound functions, leave the I/O-bound handlers alone.
# FastAPI examplefrom fastapi import FastAPIimport epochlyapp = FastAPI()# I/O-bound -- don't optimize@app.get("/api/users/{user_id}")async def get_user(user_id: int):return await db.fetch_user(user_id)# CPU-bound -- optimize@epochly.optimizedef compute_analytics(data):# Heavy computation herepass@app.get("/api/analytics")async def analytics():data = await db.fetch_data()return compute_analytics(data) # This call is accelerated
What Epochly Does NOT Help in Web Backends
- Database queries: I/O-bound. Use query optimization, indexing, caching. ~1.0x.
- Network calls: External API calls are I/O-bound. ~1.0x.
- Template rendering: Jinja2/Django templates are fast enough. ~1.0x.
- Request routing/middleware: Framework overhead is minimal. ~1.0x.
- Low-traffic endpoints: If an endpoint handles 10 req/s, the optimization might not justify the memory overhead.
Getting Started
pip install epochlyimport epochly@epochly.optimizedef your_heavy_endpoint_logic(data):# Your existing computation codepass
Profile your endpoints first. Identify which ones have high p99 latency from CPU computation. Apply @epochly.optimize to those specific functions.
Benchmark conditions: Python 3.12.3, Linux WSL2, 16 cores, NVIDIA Quadro M6000 24GB (CUDA 12.1). January 29, 2026 comprehensive benchmark report.