Your Rate Limit Is a Guess: The Case for Baselines Before Hardening

Look at the rate limits on your production WAF. Odds are high the numbers are round. 100 requests per second. 1,000 requests per minute. 10 requests per IP per 10 seconds. Round numbers are a tell. They mean the rate limit was not derived from traffic data. It was picked.

A picked rate limit is a guess. A guess can be too tight (legitimate users get 429s during peak) or too loose (attackers operate quietly under the threshold). Both failure modes cost money, and both are invisible until an incident proves which direction you guessed wrong.

The fix is not a better guess. The fix is a baseline.

Round numbers

The fingerprint of a rate limit set without a traffic baseline

The Two Failure Modes of a Guessed Limit

A rate limit sits on a knife edge. Set it too low, and the block rule fires on your own customers during traffic spikes. Set it too high, and the attack profile that matters most, the slow sustained burn, stays under the ceiling indefinitely.

Too Low: the Friendly-Fire Failure

The signal for this failure is quiet. It rarely triggers a pager. What you see instead is a slow creep in 429 responses during payroll runs, Black Friday traffic, or after a successful marketing campaign. Users retry. Some give up. Conversion drops. The WAF team never hears about it because the WAF did exactly what it was told to do.

The hidden cost of a low guess

429 responses during traffic peaks rarely make it to the security team. They show up as support tickets ("checkout was down"), cart abandonment metrics, or retention churn. By the time you correlate those back to a rate limit rule, months have passed.

Too High: the Silent-Attack Failure

The opposite failure is louder in hindsight. A limit of "1,000 requests per minute per IP" sounds strict. Against a script kiddie running a single-IP flood, it is. Against any distributed attacker, it is a ceiling the attack will never touch. The attacker probes your limit, finds it at 16 RPS per IP, and operates at 14 RPS across a pool of 200 residential IPs. That is 2,800 requests per second, sustained, below every per-IP threshold you have.

You will not see this in a dashboard. The aggregate graph looks like a long, flat, boring traffic increase. The rule never fires. The first symptom is origin CPU saturation hours later.

What a Baseline Actually Is

A baseline is not a single number. "We serve 200 RPS" is not a baseline. It is an average, and averages hide every interesting pattern in your traffic. A real baseline answers five separate questions:

1. Baseline by Path

What is the normal request rate to /api/login versus /api/products versus /static/logo.svg?

A single global rate limit makes login endpoints (where credential stuffing is the real risk) look identical to static assets (where high volume is normal). The login path may see 3 RPS at p99. The static path may see 3,000 RPS at p99. A global limit in between is wrong for both.

2. Baseline by Time

What does traffic look like at 03:00 versus 14:00 on a Tuesday? What about a Sunday?

Most production services have a daily and weekly rhythm. 3 AM traffic on a B2B product is 1/20th of 2 PM traffic. A rate limit set against the afternoon peak will look generous at night, which is precisely when automated attacks prefer to run.

3. Baseline by Geography

Where do your legitimate users live? Where does your traffic come from when you ignore the users?

If you sell to Israeli enterprises and 12% of your login attempts come from Vietnam, that 12% is not a baseline. It is a filter criterion. The baseline for "expected geographies" and the baseline for "suspicious geographies" should be separate numbers.

4. Baseline by Method

How many POSTs per minute does your login endpoint see from real users? How many GETs?

A healthy login flow sees roughly one POST per successful attempt, preceded by a page GET. A credential stuffing campaign sees hundreds of POSTs with no GETs. Method distribution is one of the cheapest anomaly signals available, and you only unlock it once you have a baseline to compare against.

5. Baseline by Authentication State

Does the request carry a session cookie or a valid API token? Or is it unauthenticated?

Authenticated traffic has a human or a known integration behind it. Unauthenticated traffic to protected endpoints is either exploration, scraping, or an attack in progress. A 429 on authenticated traffic is a customer problem. A 429 on unauthenticated traffic at the same endpoint is working as intended.

How to Build the Baseline

You do not need a new tool for this. Every CDN, WAF, and reverse proxy already emits the data. What you need is a query you run on purpose, answer the five questions, and write the answers down.

The practical recipe:

Pick a 30-day window that includes at least one peak event (a marketing push, a billing cycle, a pay day). A week is too short. A quarter is too much to reason about.
Aggregate by endpoint group, not individual URL. /api/login, /api/search, /api/checkout, /static/* are the right buckets. You are looking for categories with distinct traffic shapes.
Calculate p50, p95, p99, and peak for each endpoint group. Record the per-minute and per-hour variants. Do not report averages. Averages will lie about tails, and tails are what rate limits live on.
Segment by geography and authentication state. The same endpoint will have dramatically different baselines for authenticated Israeli traffic versus unauthenticated Vietnamese traffic. Both baselines are legitimate information.
Write down the normal range, not the number. A baseline is a band, not a point. "Login POSTs are 2-18 per minute per authenticated user at p99, 0-3 per unauthenticated IP at p99." That is usable. "Login POSTs are 5 per minute average" is not.

Why not use an ML anomaly detector instead?

You can, and mature programs do. But an anomaly detector that never had a human-curated baseline to validate against is a model training on uncurated data, which means it will quietly learn to treat ongoing low-grade attacks as normal. Build the baseline first. Let it inform the detector, not the other way around.

Common Mistakes That Poison the Baseline

Measuring the average instead of the p99

Averages flatten the traffic shape that matters. A service that averages 100 RPS and peaks at 800 RPS has nothing in common with a service that averages 100 RPS and peaks at 110 RPS. The first needs a rate limit near 1,000 to avoid friendly fire. The second can safely cap at 200.

Including bots in the legitimate baseline

If your measurement window includes ongoing scraping, uptime monitors hitting every endpoint, or a bot fleet probing your login, you baked the attack into the number you were supposed to defend against. Filter known-bot user agents and datacenter ASNs before computing the baseline, or the rate limit you derive will protect the attacker.

Using staging traffic as a proxy for production

Staging traffic shape is nothing like production. Staging sees QA runs, load tests, and internal engineers clicking around. Rate limits derived from staging will be either absurdly high (load test peaks) or absurdly low (no real user distribution). Staging is where you validate your rule logic. Production is the only place to source your thresholds.

Treating the baseline as a one-time exercise

Traffic shape changes. New features, new integrations, new customer cohorts all move the numbers. A baseline computed in Q1 is already decaying by Q3. Re-run the measurement once a quarter, or tie it to major release events.

What Changes Once You Have a Real Baseline

Three things become possible that were not before:

Per-endpoint rate limits instead of global limits. You can enforce 10 login POSTs per IP per minute without breaking the static asset path that serves 2,000 GETs per second.
Time-aware rate limits. You can tighten limits during off-peak hours, when legitimate traffic is sparse and attacks prefer to operate, without affecting daytime UX.
Anomaly alerts that are not noise. "Login POSTs from this IP are 14x the p99 baseline for this geography" is actionable. "Login POSTs exceed 100 per minute" is a threshold, not a signal.

Each of those is a defense capability your infrastructure already supports but your team cannot use until the baseline exists.

The Sequence That Works

Hardening in the wrong order is worse than not hardening. The sequence that consistently holds up under attack is:

Measure the five-dimension baseline.
Write down the per-endpoint normal ranges, with p95, p99, and peak.
Set rate limits at a defensible multiple of p99 (typically 2-3x), per endpoint, per auth state.
Enable anomaly alerts against the baseline (not against absolute thresholds).
Re-measure quarterly and after any significant release.

Every step relies on the one above it. Skip step 1 and the remaining four are all guesses wearing the uniform of hardening.

See your own attack surface before you tune it

DDactic maps every asset, fingerprints the CDN and WAF per endpoint, and tells you which paths are exposed before rate limits ever become the conversation.

Run a Free Scan →

Rate Limiting Traffic Baselines WAF Tuning DDoS Protection Application Security Anomaly Detection