Rate limiting in Bun.js: in-memory, Redis, sliding window, and production API edge cases
A practical deep dive into rate limiting middleware in Bun.js: fixed window, sliding window, token bucket, Redis, distributed limits, 429 Retry-After, abuse protection, Hono/Elysia integrations, best practices, and bad practices.

Bun.js Middleware Production Guide 2026
A series about production middleware in Bun.js: overview, security, performance, observability, rate limiting, body parsing, WebSocket/SSE, and request pipeline testing.
All articles in this guide
01
Bun.js middleware in 2026: overview, best practices, and anti-patterns
The base mental model for middleware in native Bun, Hono, and Elysia, with examples, optimization, and a roadmap for the next deep dive articles.
02
Auth middleware in Bun.js: JWT, sessions, API keys, and multi-tenant context
How to build auth middleware in Bun correctly: check order, cache, token rotation, tenant context, 401/403 errors, and testing.
03
Rate limiting in Bun.js: in-memory, Redis, sliding window, and edge cases
A detailed breakdown of rate limiting for Bun APIs: algorithms, Redis, distributed limits, abuse protection, and graceful degradation.
04
Observability middleware in Bun.js: logs, request id, tracing, and latency budgets
How to add request id, structured logs, timing headers, an OpenTelemetry-like flow, and avoid turning logging into a bottleneck.
05
Body parsing and validation in Bun.js: JSON, uploads, streams, and payload limits
How to safely read request bodies in Bun, where to place limits, and how not to break streams, uploads, idempotency, and schema validation.
A naive limit like “100 requests per IP per minute” looks fine until the first office behind NAT, the first mobile carrier, the first webhook retry storm, or the first enterprise tenant with hundreds of users behind one egress IP.
Rate limiting has to answer more than “how many requests?”. It must answer: “who exactly is being limited?”, “which route?”, “which tenant?”, “which credential?”, “what happens when Redis is down?”, “is there a Retry-After?”, and “are we breaking a legitimate burst?”.
Bun makes the HTTP layer fast, but rate limiting always depends on key design, storage atomicity, and failure policy. That is what we break down here.
429 without Retry-After makes life harder for good clients.This is the third chapter in the Bun middleware series. After the overview and auth, the next logical step is abuse protection: rate limiting is often what stands between your API and an expensive wave of unnecessary requests.
A rate limiter should be a small state machine, not a random check inside the handler. Once you split it into steps, most mistakes become visible before the code exists.
Define the identity key
This can be user id, API key id, tenant id, route group, IP, or a combination. For an authenticated API key, apiKeyId + routeGroup is usually better than plain IP.
Choose the policy
Different routes need different limits: login, search, export, webhook, public read, admin mutation. One global limit is almost always either too weak or too aggressive.
Atomically update the counter
In one process, that can be a Map. In a distributed API, that is Redis or another shared store. With Redis, increment + expiry or sliding-window update must be atomic.
Return a useful 429
The client should receive a stable JSON error, Retry-After, and preferably rate-limit headers. Otherwise, good clients do not know when to retry.
Log without secrets
Log limiter key hash/prefix, route group, tenant, decision, remaining, and reset time. Do not log the full API key or Authorization header.
Summary
If you do not have a clear answer for each of these steps, the limiter is not production-ready yet.
The algorithm defines not only accuracy, but also UX. Two clients may have the same number of requests per minute, but one creates a burst at the window boundary while the other is evenly distributed.
| Algorithm | How it works | When it fits | Weak point |
|---|---|---|---|
| Fixed window | Counter for a fixed window, for example 100 requests per minute | Simple internal endpoints, low-risk APIs, cheap baseline | Boundary burst: a client can make many requests across two window edges |
| Sliding window log | Stores request timestamps and counts only those inside the moving window | Critical APIs, login, checkout, expensive operations | More storage and cleanup work per request |
| Sliding window counter | Approximates a moving window through the current and previous bucket window | A balance of accuracy and cost for high-traffic APIs | Less accurate than the log variant and needs careful reset math |
| Token bucket | The client has a bucket of tokens that refills over time | APIs where a short burst is fine but average rate must be controlled | Capacity and refill rate must be chosen correctly |
Fixed window is simple, sliding window is more accurate, and token bucket handles legitimate bursts better.
Section algorithm-choice screenshotSummary
Start with fixed window for a simple baseline, but public expensive routes usually benefit from sliding window or token bucket.
For a single Bun process, you can write a simple in-memory fixed-window limiter. It is useful for local dev, internal tools, single-instance deployments, or as a fallback, but it does not synchronize across instances.
A minimal example:
type LimitEntry = { count: number; resetAt: number };
const limits = new Map<string, LimitEntry>();
const WINDOW_MS = 60_000;
const MAX_REQUESTS = 120;
function rateLimitKey(req: Request) {
const apiKeyId = req.headers.get("x-api-key-id");
const ip = req.headers.get("x-forwarded-for")?.split(",")[0]?.trim() ?? "unknown";
return apiKeyId ? `api:${apiKeyId}` : `ip:${ip}`;
}
function checkLimit(key: string, now = Date.now()) {
const current = limits.get(key);
if (!current || current.resetAt <= now) {
limits.set(key, { count: 1, resetAt: now + WINDOW_MS });
return { allowed: true, remaining: MAX_REQUESTS - 1, resetAt: now + WINDOW_MS };
}
if (current.count >= MAX_REQUESTS) {
return { allowed: false, remaining: 0, resetAt: current.resetAt };
}
current.count += 1;
return { allowed: true, remaining: MAX_REQUESTS - current.count, resetAt: current.resetAt };
}
Bun.serve({
async fetch(req) {
const decision = checkLimit(rateLimitKey(req));
if (!decision.allowed) {
const retryAfter = Math.ceil((decision.resetAt - Date.now()) / 1000);
return Response.json(
{ error: "rate_limited", retryAfter },
{ status: 429, headers: { "Retry-After": String(retryAfter) } },
);
}
return Response.json({ ok: true, remaining: decision.remaining });
},
});This is a fixed-window baseline. It does not actively clean old keys, does not work across multiple processes, does not have tenant-specific policies, and does not protect against boundary bursts. But it shows the right shape: key, counter, decision, 429, Retry-After.
Where it fits
An in-memory limiter is fine for one process or as a cheap local guard. A production API with autoscaling needs a shared store.
Once a Bun API runs on multiple instances, in-memory counters stop being a global limit. One client can distribute requests across instances and get a multiplier on its allowance. Redis solves this as a shared counter store.
The critical detail: counter update must be atomic. For fixed window, this is often INCR + EXPIRE, but expiry must be guaranteed to be set correctly on the first increment. For sliding window log, sorted sets and cleanup of old timestamps are common. For token bucket, a Lua script or another atomic mechanism is often needed so refill and consume happen in one operation. Redis rate limiting patterns usually rely on atomic counters or Lua. [5][6]
Redis also adds a failure mode: what happens when it is unavailable. For public expensive routes, fail closed or degraded limiting is often safer. For a critical internal control plane, the policy may differ. But any fail-open mode needs alerts, because otherwise the rate limiter disappears exactly when it is needed most.
For multiple Bun instances, the limit must live in a shared store. Otherwise, each instance gives the client a separate allowance.
Section redis-distributed screenshotPractical baseline
A distributed limiter needs a shared store, atomic update, key design, TTL cleanup, latency budget, and fail policy.
Ready-made middleware or a plugin is useful when the task is typical. But it does not decide key strategy, tenant policy, or distributed storage for you.
The Hono ecosystem has rate limiter middleware with configurable window, limit, key generator, and store options. This is good for a quick baseline in a Hono app. [1]
The Elysia ecosystem has a rate-limit plugin for Bun-first Elysia apps. It fits naturally into the Elysia lifecycle/plugins model. [2]
OWASP API Security Top 10 identifies unrestricted resource consumption as a separate risk. Rate limiting should protect CPU, memory, storage, network, and downstream resources. [3]
Summary
A ready-made limiter reduces boilerplate. Production quality depends on keys, storage, policies, observability, and edge cases.
The rate limit key is the main decision. If the key is wrong, the algorithm will not save you. IP-only limits often punish normal users behind NAT and fail to catch authenticated abuse.
For public anonymous routes, IP can be a starting key. For authenticated APIs, it is better to limit by subject id, API key id, tenant id, or a combination like tenantId + routeGroup. For login flows, you may need IP limit, account/email limit, and device fingerprint policy at the same time.
For multi-tenant SaaS, a user-only limit can be too soft because one tenant with many users can overload a resource. A tenant-only limit can be too strict because one noisy user blocks the whole company. Often you need a hierarchy: per-user, per-tenant, per-route, and global emergency limit.
HTTP 429 Too Many Requests means the user has sent too many requests in a given time period. MDN notes that the response may include Retry-After, which tells the client how long to wait before retrying. [7]
In a production API, 429 without Retry-After forces good clients to guess. They either retry too quickly or use exponential backoff where they could simply wait until reset. This hurts UX and increases unnecessary load.
Beyond Retry-After, many APIs add rate-limit headers such as RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset, or custom X-RateLimit-*. The important part is a stable contract and documentation for clients.
429 should be useful: the client should know when to retry instead of guessing backoff.
Summary
A rate limiter should be understandable for the client. 429 without a retry contract creates repeated traffic and poor integration.
Rate limiting has many unpleasant details that are invisible on the happy path. These are the ones that most often appear after launch.
NAT and corporate networks
An IP-only limit can block dozens of normal users behind one egress IP. For authenticated APIs, key by subject/API key/tenant instead.
Webhook retry storms
A partner may honestly retry failed webhooks and hit the limiter. Webhooks need separate policies, idempotency, and a retry-aware contract.
Clock skew
If the limiter is spread across systems, reset time and sliding windows must be calculated consistently. Redis server time or a centralized store is often more reliable than local clocks.
Burst after deploy
After downtime or deploy, clients may synchronously retry requests. Token bucket or queued backoff may be better than a hard fixed window.
Admin and internal routes
Do not give internal tools unlimited access by default. They often run the heaviest exports and batch operations.
Redis failure
Fail-open without alerts makes protection invisible. Fail-closed without degradation can take the product down. Policy should differ by route class.
These mistakes are not unique to Bun, but in Bun APIs they are often hidden behind a fast runtime and a simple middleware wrapper.
One global limit for all routes.
IP-only limit for authenticated APIs.
In-memory limiter in multi-instance production.
Redis INCR without correct TTL or atomicity.
No Retry-After in the 429 response.
Limits do not account for tenant, API key, or route cost.
Full API key or Authorization header enters logs as the limiter key.
Fail-open during Redis outage without alerting.
Rate limiter runs after body parsing for expensive payload routes.
No tests for boundary burst, reset, Redis failure, and concurrent requests.
Review rule
The rate limiter should run early, have the right key, an atomic counter, a clear 429, and an observable decision.
Before launching rate limiting in staging or production, go through this list. It helps find problems before customers do.
Route classes are defined
Public, auth, login, webhook, export, admin, and internal routes have different policies.
Key strategy is not IP-only
For authenticated routes, use subject/API key/tenant/route group, and keep IP as an additional signal.
Distributed store exists for multi-instance
If the Bun API has multiple instances, counters live in Redis or another shared store.
Operations are atomic
Increment, expiry, sliding window cleanup, or token consume happen without race conditions.
429 has a retry contract
The response contains a stable JSON error and Retry-After; rate-limit headers are documented.
Limiter runs before expensive operations
Rate check happens before body parsing, DB calls, remote calls, and heavy transforms when the route allows it.
Redis failure policy is defined
For each route class, fail-open or fail-closed is known, and alerting exists.
Observability exists
Decision, route group, limiter key hash, remaining, reset time, storage latency, and Redis failures are logged.
Bun gives you a fast HTTP runtime, but rate limiting is not a runtime feature you can add with one line and forget. It is a security and reliability policy that must know who it limits, for which route, with which storage, which algorithm, and which retry contract.
For one process, an in-memory fixed window can be a reasonable baseline. For production with multiple instances, you need Redis or another shared store. For user-facing APIs, fixed window is often too rough; sliding window or token bucket gives a better UX.
Most importantly: do not punish real users with a bad key. IP-only limits, one global limit, and 429 without Retry-After usually create more problems than they solve.
Only for one process, local dev, internal tools, or a simple baseline. If the API has multiple instances, an in-memory limit is multiplied by the number of instances and is not global protection.
Fixed window is the simplest, but it has boundary bursts. Sliding window is more accurate for critical routes, but more expensive. Token bucket allows short legitimate bursts while controlling average rate.
An IP-only limit can block normal users behind NAT, a corporate proxy, or a mobile carrier, while still working poorly for authenticated abuse. For APIs, it is better to limit by subject, API key, tenant, and route group.
A stable JSON error with HTTP `429`, the `Retry-After` header, and preferably rate-limit headers such as remaining/reset. This helps good clients retry correctly. [7]
Not necessarily for one process. For production with multiple instances or serverless/concurrent deployment, Redis or another shared store is practically required so counters are shared.
Define the fail policy in advance. For expensive public routes, fail closed or a degraded strict local limit is often safer. For some internal/control-plane routes, fail-open may be acceptable, but only with alerting and audit.
These sources confirm ready-made middleware/plugin options, the security rationale for resource limiting, Redis rate limiting patterns, and HTTP semantics for 429.
PAS7 Studio can help design rate limiting for Bun, Hono, or Elysia: route classes, Redis store, sliding window or token bucket, API-key/tenant budgets, abuse monitoring, and a correct 429 contract.
This is especially useful for SaaS, public APIs, webhook endpoints, AI/automation products, and migrations from Express/Fastify where old limits do not account for tenants, API keys, or horizontal scaling.
Rate limiting in Bun.js: in-memory, Redis, sliding window, and edge cases
Related Articles
AI Assistant Development Cost in 2026: RAG Chatbots, CRM Integrations, Guardrails, and Support
A practical buyer guide to AI assistant development cost in 2026: prototypes, RAG chatbots, knowledge-base assistants, CRM and website integrations, guardrails, evaluations, monitoring, and support.
AI for landing page development: where it speeds up launches and where it hurts conversion
A practical research piece on using AI for landing page development: v0, Webflow AI, Builder.io, Framer-like builders, UX generation, copy, SEO, personalization, A/B testing, template risk, accessibility, security and technical debt.
AI SEO / GEO in 2026: Your Next Customers Aren’t Humans — They’re Agents
Search is shifting from clicks to answers. Bots and AI agents crawl, cite, recommend, and increasingly buy. Learn what AI SEO / GEO means, why classic SEO is no longer enough, and how PAS7 Studio helps brands win visibility in the agentic web.
The most powerful Apple chip yet? M5 Pro and M5 Max are breaking records
A data-backed March 2026 analysis of Apple M5 Pro and M5 Max. We break down why these chips can credibly be called Apple's most powerful pro laptop silicon, how they compare with M4 Pro, M4 Max, M1 Pro, M1 Max, and how they stack up against Intel and AMD laptop rivals.
Professional development for your business
We create modern web solutions and bots for businesses. Learn how we can help you achieve your goals.