PAS7 Studio

Rate limiting in Bun.js: in-memory, Redis, sliding window, and production API edge cases

A practical deep dive into rate limiting middleware in Bun.js: fixed window, sliding window, token bucket, Redis, distributed limits, 429 Retry-After, abuse protection, Hono/Elysia integrations, best practices, and bad practices.

14 May 2026· 15 min read· Technology
Best forBackend engineersFull-stack developersTech leadsTeams building public APIs, SaaS, or webhook endpoints on Bun
Technical illustration of a rate limiting pipeline in Bun.js with counters, Redis, and controlled request flow

A naive limit like “100 requests per IP per minute” looks fine until the first office behind NAT, the first mobile carrier, the first webhook retry storm, or the first enterprise tenant with hundreds of users behind one egress IP.

Rate limiting has to answer more than “how many requests?”. It must answer: “who exactly is being limited?”, “which route?”, “which tenant?”, “which credential?”, “what happens when Redis is down?”, “is there a Retry-After?”, and “are we breaking a legitimate burst?”.

Bun makes the HTTP layer fast, but rate limiting always depends on key design, storage atomicity, and failure policy. That is what we break down here.

An in-memory limiter is suitable only for a single process or a local baseline.
Redis is needed when the API has multiple instances or horizontal scaling.
Sliding window and token bucket solve different problems.
429 without Retry-After makes life harder for good clients.

This is the third chapter in the Bun middleware series. After the overview and auth, the next logical step is abuse protection: rate limiting is often what stands between your API and an expensive wave of unnecessary requests.

The mental model of rate limiting middleware: identify key, choose policy, check counter, allow or reject.
Fixed window, sliding window, and token bucket: where each algorithm works well and where it creates edge cases. [5][6]
A native Bun in-memory limiter example for a single process.
Redis-backed limiter for distributed APIs and why atomicity matters. [5][6]
Hono/Elysia options: when to use ready-made middleware/plugin. [1][2]
HTTP 429 Too Many Requests and Retry-After: what the client should receive. [7]
Bad practices: IP-only limits, global limits for all routes, fail-open without alerts, plaintext API key in limiter key.

A rate limiter should be a small state machine, not a random check inside the handler. Once you split it into steps, most mistakes become visible before the code exists.

01

Define the identity key

This can be user id, API key id, tenant id, route group, IP, or a combination. For an authenticated API key, apiKeyId + routeGroup is usually better than plain IP.

02

Choose the policy

Different routes need different limits: login, search, export, webhook, public read, admin mutation. One global limit is almost always either too weak or too aggressive.

03

Atomically update the counter

In one process, that can be a Map. In a distributed API, that is Redis or another shared store. With Redis, increment + expiry or sliding-window update must be atomic.

04

Return a useful 429

The client should receive a stable JSON error, Retry-After, and preferably rate-limit headers. Otherwise, good clients do not know when to retry.

05

Log without secrets

Log limiter key hash/prefix, route group, tenant, decision, remaining, and reset time. Do not log the full API key or Authorization header.

Summary

If you do not have a clear answer for each of these steps, the limiter is not production-ready yet.

The algorithm defines not only accuracy, but also UX. Two clients may have the same number of requests per minute, but one creates a burst at the window boundary while the other is evenly distributed.

AlgorithmHow it worksWhen it fitsWeak point
Fixed windowCounter for a fixed window, for example 100 requests per minuteSimple internal endpoints, low-risk APIs, cheap baselineBoundary burst: a client can make many requests across two window edges
Sliding window logStores request timestamps and counts only those inside the moving windowCritical APIs, login, checkout, expensive operationsMore storage and cleanup work per request
Sliding window counterApproximates a moving window through the current and previous bucket windowA balance of accuracy and cost for high-traffic APIsLess accurate than the log variant and needs careful reset math
Token bucketThe client has a bucket of tokens that refills over timeAPIs where a short burst is fine but average rate must be controlledCapacity and refill rate must be chosen correctly

Fixed window is simple, sliding window is more accurate, and token bucket handles legitimate bursts better.

Section algorithm-choice screenshot

Summary

Start with fixed window for a simple baseline, but public expensive routes usually benefit from sliding window or token bucket.

For a single Bun process, you can write a simple in-memory fixed-window limiter. It is useful for local dev, internal tools, single-instance deployments, or as a fallback, but it does not synchronize across instances.

A minimal example:

TS
type LimitEntry = { count: number; resetAt: number };
const limits = new Map<string, LimitEntry>();

const WINDOW_MS = 60_000;
const MAX_REQUESTS = 120;

function rateLimitKey(req: Request) {
  const apiKeyId = req.headers.get("x-api-key-id");
  const ip = req.headers.get("x-forwarded-for")?.split(",")[0]?.trim() ?? "unknown";
  return apiKeyId ? `api:${apiKeyId}` : `ip:${ip}`;
}

function checkLimit(key: string, now = Date.now()) {
  const current = limits.get(key);

  if (!current || current.resetAt <= now) {
    limits.set(key, { count: 1, resetAt: now + WINDOW_MS });
    return { allowed: true, remaining: MAX_REQUESTS - 1, resetAt: now + WINDOW_MS };
  }

  if (current.count >= MAX_REQUESTS) {
    return { allowed: false, remaining: 0, resetAt: current.resetAt };
  }

  current.count += 1;
  return { allowed: true, remaining: MAX_REQUESTS - current.count, resetAt: current.resetAt };
}

Bun.serve({
  async fetch(req) {
    const decision = checkLimit(rateLimitKey(req));

    if (!decision.allowed) {
      const retryAfter = Math.ceil((decision.resetAt - Date.now()) / 1000);
      return Response.json(
        { error: "rate_limited", retryAfter },
        { status: 429, headers: { "Retry-After": String(retryAfter) } },
      );
    }

    return Response.json({ ok: true, remaining: decision.remaining });
  },
});

This is a fixed-window baseline. It does not actively clean old keys, does not work across multiple processes, does not have tenant-specific policies, and does not protect against boundary bursts. But it shows the right shape: key, counter, decision, 429, Retry-After.

Where it fits

An in-memory limiter is fine for one process or as a cheap local guard. A production API with autoscaling needs a shared store.

Once a Bun API runs on multiple instances, in-memory counters stop being a global limit. One client can distribute requests across instances and get a multiplier on its allowance. Redis solves this as a shared counter store.

The critical detail: counter update must be atomic. For fixed window, this is often INCR + EXPIRE, but expiry must be guaranteed to be set correctly on the first increment. For sliding window log, sorted sets and cleanup of old timestamps are common. For token bucket, a Lua script or another atomic mechanism is often needed so refill and consume happen in one operation. Redis rate limiting patterns usually rely on atomic counters or Lua. [5][6]

Redis also adds a failure mode: what happens when it is unavailable. For public expensive routes, fail closed or degraded limiting is often safer. For a critical internal control plane, the policy may differ. But any fail-open mode needs alerts, because otherwise the rate limiter disappears exactly when it is needed most.

For multiple Bun instances, the limit must live in a shared store. Otherwise, each instance gives the client a separate allowance.

Section redis-distributed screenshot

Practical baseline

A distributed limiter needs a shared store, atomic update, key design, TTL cleanup, latency budget, and fail policy.

Ready-made middleware or a plugin is useful when the task is typical. But it does not decide key strategy, tenant policy, or distributed storage for you.

The Hono ecosystem has rate limiter middleware with configurable window, limit, key generator, and store options. This is good for a quick baseline in a Hono app. [1]
Hono rate limiter middleware
The Elysia ecosystem has a rate-limit plugin for Bun-first Elysia apps. It fits naturally into the Elysia lifecycle/plugins model. [2]
Elysia rate-limit plugin
OWASP API Security Top 10 identifies unrestricted resource consumption as a separate risk. Rate limiting should protect CPU, memory, storage, network, and downstream resources. [3]
OWASP API Security

Summary

A ready-made limiter reduces boilerplate. Production quality depends on keys, storage, policies, observability, and edge cases.

The rate limit key is the main decision. If the key is wrong, the algorithm will not save you. IP-only limits often punish normal users behind NAT and fail to catch authenticated abuse.

For public anonymous routes, IP can be a starting key. For authenticated APIs, it is better to limit by subject id, API key id, tenant id, or a combination like tenantId + routeGroup. For login flows, you may need IP limit, account/email limit, and device fingerprint policy at the same time.

For multi-tenant SaaS, a user-only limit can be too soft because one tenant with many users can overload a resource. A tenant-only limit can be too strict because one noisy user blocks the whole company. Often you need a hierarchy: per-user, per-tenant, per-route, and global emergency limit.

Public read: IP + route group.
Login: IP + account/email + device/risk signal.
API key: key id + route group + tenant.
SaaS tenant: tenant budget + per-user budget.
Expensive exports/search: a separate narrow route limit.

HTTP 429 Too Many Requests means the user has sent too many requests in a given time period. MDN notes that the response may include Retry-After, which tells the client how long to wait before retrying. [7]

In a production API, 429 without Retry-After forces good clients to guess. They either retry too quickly or use exponential backoff where they could simply wait until reset. This hurts UX and increases unnecessary load.

Beyond Retry-After, many APIs add rate-limit headers such as RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset, or custom X-RateLimit-*. The important part is a stable contract and documentation for clients.

429 should be useful: the client should know when to retry instead of guessing backoff.

Section headers-contract screenshot

Summary

A rate limiter should be understandable for the client. 429 without a retry contract creates repeated traffic and poor integration.

Rate limiting has many unpleasant details that are invisible on the happy path. These are the ones that most often appear after launch.

NAT and corporate networks

An IP-only limit can block dozens of normal users behind one egress IP. For authenticated APIs, key by subject/API key/tenant instead.

Webhook retry storms

A partner may honestly retry failed webhooks and hit the limiter. Webhooks need separate policies, idempotency, and a retry-aware contract.

Clock skew

If the limiter is spread across systems, reset time and sliding windows must be calculated consistently. Redis server time or a centralized store is often more reliable than local clocks.

Burst after deploy

After downtime or deploy, clients may synchronously retry requests. Token bucket or queued backoff may be better than a hard fixed window.

Admin and internal routes

Do not give internal tools unlimited access by default. They often run the heaviest exports and batch operations.

Redis failure

Fail-open without alerts makes protection invisible. Fail-closed without degradation can take the product down. Policy should differ by route class.

These mistakes are not unique to Bun, but in Bun APIs they are often hidden behind a fast runtime and a simple middleware wrapper.

One global limit for all routes.

IP-only limit for authenticated APIs.

In-memory limiter in multi-instance production.

Redis INCR without correct TTL or atomicity.

No Retry-After in the 429 response.

Limits do not account for tenant, API key, or route cost.

Full API key or Authorization header enters logs as the limiter key.

Fail-open during Redis outage without alerting.

Rate limiter runs after body parsing for expensive payload routes.

No tests for boundary burst, reset, Redis failure, and concurrent requests.

Review rule

The rate limiter should run early, have the right key, an atomic counter, a clear 429, and an observable decision.

Before launching rate limiting in staging or production, go through this list. It helps find problems before customers do.

Route classes are defined

Public, auth, login, webhook, export, admin, and internal routes have different policies.

Key strategy is not IP-only

For authenticated routes, use subject/API key/tenant/route group, and keep IP as an additional signal.

Distributed store exists for multi-instance

If the Bun API has multiple instances, counters live in Redis or another shared store.

Operations are atomic

Increment, expiry, sliding window cleanup, or token consume happen without race conditions.

429 has a retry contract

The response contains a stable JSON error and Retry-After; rate-limit headers are documented.

Limiter runs before expensive operations

Rate check happens before body parsing, DB calls, remote calls, and heavy transforms when the route allows it.

Redis failure policy is defined

For each route class, fail-open or fail-closed is known, and alerting exists.

Observability exists

Decision, route group, limiter key hash, remaining, reset time, storage latency, and Redis failures are logged.

Bun gives you a fast HTTP runtime, but rate limiting is not a runtime feature you can add with one line and forget. It is a security and reliability policy that must know who it limits, for which route, with which storage, which algorithm, and which retry contract.

For one process, an in-memory fixed window can be a reasonable baseline. For production with multiple instances, you need Redis or another shared store. For user-facing APIs, fixed window is often too rough; sliding window or token bucket gives a better UX.

Most importantly: do not punish real users with a bad key. IP-only limits, one global limit, and 429 without Retry-After usually create more problems than they solve.

Is an in-memory rate limiter enough in Bun.js?

Only for one process, local dev, internal tools, or a simple baseline. If the API has multiple instances, an in-memory limit is multiplied by the number of instances and is not global protection.

Which is better: fixed window, sliding window, or token bucket?

Fixed window is the simplest, but it has boundary bursts. Sliding window is more accurate for critical routes, but more expensive. Token bucket allows short legitimate bursts while controlling average rate.

Why is IP-only rate limiting bad?

An IP-only limit can block normal users behind NAT, a corporate proxy, or a mobile carrier, while still working poorly for authenticated abuse. For APIs, it is better to limit by subject, API key, tenant, and route group.

What should a Bun API return when the limit is exceeded?

A stable JSON error with HTTP `429`, the `Retry-After` header, and preferably rate-limit headers such as remaining/reset. This helps good clients retry correctly. [7]

Do I need Redis for rate limiting?

Not necessarily for one process. For production with multiple instances or serverless/concurrent deployment, Redis or another shared store is practically required so counters are shared.

What should happen if Redis is unavailable?

Define the fail policy in advance. For expensive public routes, fail closed or a degraded strict local limit is often safer. For some internal/control-plane routes, fail-open may be acceptable, but only with alerting and audit.

These sources confirm ready-made middleware/plugin options, the security rationale for resource limiting, Redis rate limiting patterns, and HTTP semantics for 429.

Reviewed: 14 May 2026Applies to: Bun 1.3.xApplies to: Bun.serve routesApplies to: Hono 4.xApplies to: Elysia 1.xApplies to: Redis-backed APIsTested with: Bun.serve middleware compositionTested with: Redis countersTested with: Hono rate limiter middlewareTested with: Elysia rate-limit pluginTested with: HTTP 429 Retry-After response

PAS7 Studio can help design rate limiting for Bun, Hono, or Elysia: route classes, Redis store, sliding window or token bucket, API-key/tenant budgets, abuse monitoring, and a correct 429 contract.

This is especially useful for SaaS, public APIs, webhook endpoints, AI/automation products, and migrations from Express/Fastify where old limits do not account for tenants, API keys, or horizontal scaling.

You are here03/05

Rate limiting in Bun.js: in-memory, Redis, sliding window, and edge cases

Related Articles

ai-assistants

AI Assistant Development Cost in 2026: RAG Chatbots, CRM Integrations, Guardrails, and Support

A practical buyer guide to AI assistant development cost in 2026: prototypes, RAG chatbots, knowledge-base assistants, CRM and website integrations, guardrails, evaluations, monitoring, and support.

blogs

AI for landing page development: where it speeds up launches and where it hurts conversion

A practical research piece on using AI for landing page development: v0, Webflow AI, Builder.io, Framer-like builders, UX generation, copy, SEO, personalization, A/B testing, template risk, accessibility, security and technical debt.

growth

AI SEO / GEO in 2026: Your Next Customers Aren’t Humans — They’re Agents

Search is shifting from clicks to answers. Bots and AI agents crawl, cite, recommend, and increasingly buy. Learn what AI SEO / GEO means, why classic SEO is no longer enough, and how PAS7 Studio helps brands win visibility in the agentic web.

blogs

The most powerful Apple chip yet? M5 Pro and M5 Max are breaking records

A data-backed March 2026 analysis of Apple M5 Pro and M5 Max. We break down why these chips can credibly be called Apple's most powerful pro laptop silicon, how they compare with M4 Pro, M4 Max, M1 Pro, M1 Max, and how they stack up against Intel and AMD laptop rivals.

Professional development for your business

We create modern web solutions and bots for businesses. Learn how we can help you achieve your goals.