The Rate Limit Loophole: How CDNs Miscount

You configured a rate limit of 100 requests per 10 seconds. You tested it. It works. But an attacker with traffic distributed across 10 geographic regions can push 1,000 requests per 10 seconds through your CDN without triggering that rule.

This is not a misconfiguration. It is not a bug. It is an architectural consequence of how CDNs count requests, and it affects nearly every major vendor in the market.

We spent several weeks researching the rate limit counting architectures of 10 CDN and DDoS protection vendors, reviewing their documentation, engineering blogs, and published patents. The findings reveal a gap between what "rate limit: 100 req/10s" implies and what it actually enforces in a distributed edge network.

N x threshold

Effective rate limit when attackers distribute traffic across N edge locations

The Core Problem: Where Are Requests Counted?

Rate limiting seems simple. Count the requests from a given source within a time window. If the count exceeds the threshold, block. But in a globally distributed CDN with hundreds of edge locations, a fundamental question emerges: where does the counting happen?

There are three possible architectures:

Per-PoP (per data center): Each edge location maintains its own independent counters. Fastest to enforce, but counters are not shared across locations.
Per-server: Each individual server within each PoP has its own counter. Even more granular, even less accurate globally.
Global (centralized): A single logical counter aggregated across all edge locations. Accurate, but introduces latency and propagation delays.

The choice between these models creates a fundamental tradeoff. Distributed counters are fast but approximate. Centralized counters are accurate but slow. And the difference between the two determines whether an attacker can trivially bypass your configured rate limit by distributing their traffic geographically.

The Attacker's Math

If a CDN uses per-PoP counting and operates 300 edge locations, the theoretical maximum effective rate limit is 300 times your configured threshold. In practice, attackers only need to hit 5-10 distinct PoPs to multiply their allowed throughput significantly.

Vendor-by-Vendor Findings

We examined the counting architecture of 10 vendors. The results range from well-documented transparency to near-total silence on the topic.

Cloudflare - Per-PoP

Counting Model: Independent counters per data center Per-PoP

Each Cloudflare data center maintains independent rate limit counters using Twemproxy and memcached with consistent hashing. Counters are shared across servers within a single DC but are not synchronized across DCs. The exception: geographically clustered DCs in the same metro area share counters.

Cloudflare uses a sliding window algorithm, storing two numbers per counter (current and previous period count) for smooth rate estimation. Their DDoS mitigation stack has a separate path: dosd provides per-server autonomous detection (handling 98.6% of L3/L4 attacks), while gatebot provides centralized global analysis for sophisticated attacks.

To their credit, Cloudflare has published detailed engineering blog posts explaining this architecture. The per-PoP limitation is documented, not hidden.

Akamai - Per-Edge with Sync Delay

Counting Model: Edge servers sync with 1-3 second delay Per-Edge + Sync

Akamai edge servers share counter values across the network, but with a 1-3 second synchronization delay. They use a rolling 5-second window per IP address. For steady traffic, the median counting error is under 10%. For sudden traffic changes (exactly the pattern in a DDoS attack), the error can reach 20% or more.

In late 2024, Akamai introduced "Aggregated Rate Limiting," specifically designed to counter geographic distribution attacks. This newer capability counts across a broader request distribution scope, partially addressing the per-edge gap. However, it remains a separate feature that must be explicitly enabled.

Akamai's Prolexic scrubbing centers (36+ locations, 20+ Tbps capacity) operate on a separate path via BGP diversion and have their own counting model.

Imperva - Per-PoP (Likely)

Counting Model: Per-PoP, not documented Per-PoP (Undocumented)

Every Imperva PoP runs the complete service stack: DDoS scrubbing, WAF, bot protection, caching, and load balancing. Rate limiting happens at the scrubbing layer within each PoP. Counter synchronization across PoPs is not documented anywhere in Imperva's public materials. Given the architecture (full stack per PoP) and the absence of any mention of cross-PoP counter sharing, per-PoP counting is the most likely behavior.

Total network capacity: 13 Tbps.

AWS WAF - Global, but Delayed

Counting Model: Centralized global aggregation Global + Delay

AWS WAF rate-based rules aggregate counts globally across all CloudFront edge locations. This makes AWS one of the few vendors with explicitly centralized counting in a CDN-inline deployment.

The catch: propagation delays of up to several minutes (typical under 30 seconds). AWS uses weighted estimation rather than exact counting, and their own documentation states this is "not intended for precise rate limiting." The minimum threshold is 100 requests per evaluation window, which is configurable from 10 seconds to 600 seconds.

AWS is transparent about these limitations. Their documentation explicitly warns that rate-based rules provide approximate enforcement, not exact counting.

Azure Front Door - Per-Server

Counting Model: Per individual server Per-Server (Worst)

Azure Front Door has the most granular (and therefore least accurate) counting model of any vendor we examined. Rate limit counters are maintained per individual Front Door server, not per PoP and not globally. Microsoft explicitly documents this:

"It's possible that requests from the same client might arrive at a different Azure Front Door server that hasn't refreshed the rate limit counters yet."

Microsoft acknowledges that low thresholds (under 200 requests per minute) are unreliable and recommends using larger time windows (5 minutes instead of 1 minute) to reduce the impact of distributed counting. With 192 edge PoPs worldwide and multiple servers per PoP, the multiplication factor is substantial.

GCP Cloud Armor - Per-Region

Counting Model: Independent counters per Google Cloud region Per-Region

Cloud Armor enforces rate limits independently in each Google Cloud region. If a service runs in 2 regions, the effective limit is 2x the configured value. Additionally, each backend service within a region gets its own full threshold.

Google's documentation states plainly: "enforced rate limits are approximate and might not be strictly accurate." Cloud Armor supports aggregation keys including IP, region, HTTP header, XFF, cookie, path, and JA3/JA4 fingerprints, but the per-region boundary applies regardless of the key chosen.

Radware - Deployment-Dependent

Counting Model: Varies by deployment mode Centralized (on-prem)

Radware is the most nuanced case. Their on-premises DefensePro appliance operates as a single device with a single counter, providing centralized, accurate rate limiting with ASIC-based line-rate inspection. No per-PoP problem exists because there is only one counting point.

Their Cloud WAF product, however, uses per-PoP counting similar to other cloud WAFs and is less documented. In hybrid deployments (DefenseFlow + DefensePro), the on-prem device has full traffic visibility while the cloud component (DefensePipe) provides overflow capacity via BGP diversion.

Fastly - Hybrid Local + Global

Counting Model: Local and global counting mechanisms Hybrid

Fastly explicitly states they use "both local and global counting mechanisms." For on-prem deployments, the local agent maintains counters while the cloud engine aggregates them every 30 seconds. For edge WAF deployments, Fastly's documentation warns that you may need to configure 2x your intended threshold to account for distributed counting across cache nodes.

This is a rare case of a vendor directly acknowledging the multiplication factor in their configuration guidance.

The Complete Comparison

Here is every vendor side by side. The "Effective Limit" column shows what happens when an attacker distributes traffic across N edge locations.

Vendor	Counting Model	Effective Limit (N locations)	Documented?
Cloudflare	Per-PoP	~N x threshold	Yes (engineering blog)
Akamai	Per-edge, 1-3s sync	~N x threshold (mitigated with aggregated RL)	Yes
Imperva	Per-PoP (likely)	~N x threshold	No
AWS WAF	Global (centralized)	= threshold (delayed minutes)	Yes
Azure Front Door	Per-server	~N x threshold	Yes (Microsoft docs)
GCP Cloud Armor	Per-region	= regions x threshold	Yes
Radware (on-prem)	Centralized	= threshold	N/A
Radware (cloud)	Per-PoP	~N x threshold	No
Fastly Edge WAF	Hybrid	~2x threshold	Yes
Arbor/Netscout	Centralized	= threshold	Yes

The Architectural Tradeoff

This is not a case of vendors being negligent. It is a genuine engineering tradeoff with no perfect solution. The CDN industry has converged on three models, each with distinct strengths and weaknesses:

Model	Rate Limit Accuracy	Latency Added	Activation Speed
CDN-inline (Cloudflare, Akamai edge, Azure FD)	Approximate (per-PoP)	+1-4ms	Always-on
Scrubbing center (Arbor, Lumen)	Accurate (centralized)	+22-80ms	30-180s BGP delay
Hybrid (Radware, Akamai + Prolexic)	Accurate on-prem + approximate cloud	Minimal normally	Always-on + overflow

CDN-inline models give you always-on protection with minimal latency, but they pay for it with approximate counting. Scrubbing center models give you centralized, accurate counting, but activation takes 30-180 seconds (during which traffic flows unprotected) and adds latency even during normal operation. Hybrid models try to capture the benefits of both, at the cost of deployment complexity.

Only AWS WAF attempts global centralized counting in a CDN-inline model, and the price is propagation delays of up to several minutes. That delay window is itself an exploitable gap.

Why Not Just Sync Counters Globally?

Real-time global counter synchronization across hundreds of edge locations would require sub-millisecond consensus, which is bounded by the speed of light. A counter update from Tokyo to London takes at least 40ms in network round-trip time alone. At 100,000+ requests per second, the coordination overhead would exceed the cost of the requests themselves. This is the same fundamental constraint that makes distributed databases hard. Academic research (Raghavan et al., ACM SIGCOMM 2007) formally proved that exact distributed rate limiting under network partitions is impossible without violating either accuracy or latency guarantees.

What This Means for Defenders

If your DDoS protection relies on CDN-layer rate limiting as the primary defense, you should understand the following:

1. Your Configured Threshold Is an Upper Bound Per Location, Not a Global Cap

When you set "100 requests per 10 seconds," most CDNs enforce that independently at each edge location. An attacker routing traffic through a botnet spread across 10 cities effectively gets a 1,000 req/10s allowance. This is the expected behavior for geographically distributed botnets, which is exactly the attack profile you are trying to defend against.

2. Testing From a Single Location Gives False Confidence

If you test your rate limit from a single IP in a single region and it triggers correctly, you have validated exactly one edge case (literally). A proper test requires multi-region traffic generation. Our API DDoS research covers why API endpoints are particularly affected by this gap.

3. Layer Your Defenses

CDN rate limiting should be one layer, not the only layer. Combine it with:

Origin-level rate limiting: Your application server sees all traffic regardless of which PoP it came through. Origin-level counters are inherently centralized.
Behavioral analysis: Detect anomalous patterns across the full request stream, not just volume thresholds.
WAF rules: Block known attack signatures before they reach rate limit evaluation. See our WAF configuration analysis for common gaps.
IP reputation and geofencing: Reduce the attack surface before counting even begins.

4. Understand Your Vendor's Specific Model

The difference between per-PoP (Cloudflare), per-server (Azure), and per-region (GCP) is significant. Per-server counting means the multiplication factor is not N PoPs, but N servers across all PoPs, a much larger number. Ask your vendor explicitly: "Are your rate limit counters synchronized globally, or enforced per edge location?"

5. Consider the Propagation Delay Window

Even vendors with global counting (AWS WAF) have a vulnerability window during propagation. Several minutes of unthrottled traffic during a volumetric attack can be enough to overwhelm origin infrastructure. This delay is not a flaw in implementation. It is a physical constraint of distributed systems.

The CDN Bypass Multiplier

This rate limit gap compounds with another common problem: discoverable origin IPs. If an attacker can find your origin IP and bypass the CDN entirely, rate limits become irrelevant. The combination of distributed counting and origin exposure creates a defense gap wider than either issue alone.

Vendor Transparency Matters

One of the most striking findings from this research is the variance in vendor transparency. Cloudflare, AWS, Azure, and Fastly publish detailed documentation about their counting models, including explicit warnings about accuracy limitations. Imperva and Radware's cloud WAF provide little to no public documentation on counter synchronization.

Fastly deserves particular credit for directly telling customers they may need to double their threshold to account for distributed counting. That is the kind of practical honesty that helps defenders make informed decisions.

Transparency does not fix the architectural limitation, but it allows security teams to compensate for it. When a vendor says "rate limits are approximate," teams can plan accordingly. When a vendor says nothing, teams assume exact enforcement and build fragile defenses.

Academic Context

This is not a new problem in computer science. Distributed rate limiting has been studied for nearly two decades:

Raghavan et al., ACM SIGCOMM 2007 (Best Student Paper): "Cloud Control with Distributed Rate Limiting" formally defined the tradeoffs between accuracy, communication overhead, and latency in distributed counting systems.
IEEE, 2025: "Distributed Rate Limiting Under Decentralized Cloud Networks" revisited the problem in the context of modern CDN architectures.
HackTricks: Documents per-PoP rate limit evasion as a known penetration testing technique, confirming that the offensive community has been aware of this gap for years.

The gap between academic understanding and operational practice is significant. Security teams configuring rate limits in CDN dashboards are rarely aware that the threshold they enter will be enforced approximately, not exactly.

Implications for DDoS Resilience

Our DDoS resilience research consistently finds that organizations overestimate the protection provided by their CDN configuration. Rate limit counting is one of the least-understood contributors to that overconfidence.

The practical impact depends on your threat model:

Credential stuffing: Attackers already distribute attempts across many IPs. Per-PoP counting makes IP-based rate limits even less effective against distributed credential attacks.
API abuse: Automated scraping and enumeration attacks that stay just under the per-PoP threshold can run indefinitely without triggering rate limits, even though the aggregate volume far exceeds the configured cap.
Application-layer DDoS: Low-and-slow attacks sending expensive requests from multiple regions can exhaust backend resources while every individual PoP sees sub-threshold traffic.

How Does Your Rate Limiting Actually Perform?

DDactic's free infrastructure scan identifies rate limit gaps, CDN bypass vectors, and WAF misconfigurations before attackers do. We test from multiple regions to reveal the real effective threshold, not just the configured one.

Get a Free Scan

Rate Limiting CDN Architecture DDoS Protection Cloudflare Akamai AWS WAF Azure Front Door GCP Cloud Armor Distributed Systems Edge Computing Security Research