Your WAF inspects HTTP. gRPC speaks protobuf over HTTP/2. The WAF sees frames, not payloads. That gap is an attack surface, and it is growing every quarter as more organizations adopt gRPC for their microservices, mobile backends, and internal APIs.
We built a tool called grpc_stress.go with five distinct attack modes targeting gRPC services. Every one of them passes through standard WAF configurations without triggering a single rule. This post explains why that happens, what the five modes do, and what you can do about it.
Why gRPC Is Different
gRPC is Google's open-source RPC framework, now the default communication layer for Kubernetes services, mobile app backends, and an increasing number of public-facing APIs. If you work with microservices, you almost certainly have gRPC in your stack.
At the protocol level, gRPC works like this:
- Transport: HTTP/2 carries gRPC calls as standard HTTP/2 frames. The content type is
application/grpc. - Serialization: Request and response bodies are encoded using Protocol Buffers (protobuf), a binary serialization format. Not JSON. Not XML. Binary.
- Streaming: gRPC natively supports four communication patterns: unary (single request, single response), server streaming, client streaming, and bidirectional streaming. A single HTTP/2 connection can multiplex hundreds of concurrent streams.
- Metadata: gRPC uses HTTP/2 headers for metadata, including deadlines, authentication tokens, and custom key-value pairs.
This architecture creates a fundamental problem for WAFs. Traditional WAFs were designed to inspect HTTP/1.1 requests with text-based bodies (form data, JSON, XML). They parse URL paths, query parameters, cookies, and request bodies to detect SQL injection, XSS, command injection, and other attack patterns.
gRPC breaks every one of those assumptions.
The Inspection Gap
A WAF sitting in front of a gRPC service sees HTTP/2 frames with a content type of application/grpc and a binary payload it cannot parse. It can inspect the HTTP/2 headers and the frame structure, but it has no visibility into the protobuf-encoded message content. This is not a misconfiguration. It is a protocol-level limitation.
The 5 Attack Modes
We built grpc_stress.go to systematically test each dimension of the gRPC attack surface. Each mode exploits a different aspect of the protocol, and each is invisible to HTTP-layer inspection.
Mode 1: Stream Flooding
Exploit: HTTP/2 Stream Multiplexing
gRPC's bidirectional streaming means a single TCP connection can carry hundreds of concurrent streams. Each stream is an independent gRPC call that the server must allocate resources to handle: goroutines, memory buffers, processing threads.
In stream flood mode, the tool opens a single HTTP/2 connection and rapidly creates streams, sending requests on each one. The server sees hundreds of concurrent operations arrive on what looks like a single connection. From the WAF's perspective, this is one TCP connection with multiplexed HTTP/2 frames. No volume anomaly is visible at the connection level.
The default HTTP/2 setting allows 100 concurrent streams per connection. But the attacker can open multiple connections, each carrying 100 streams. Ten connections produce 1,000 concurrent server-side operations, all arriving through what appears to be minimal network activity.
Mode 2: Large Message Attack
Exploit: gRPC Default Max Message Size (4 MB)
By default, gRPC allows messages up to 4 MB. This is a generous limit, and many services never change it. The large message attack sends protobuf-encoded messages near the maximum size on every request.
Each message must be fully received and deserialized by the server before it can determine whether the content is valid. The protobuf deserialization step itself consumes CPU and memory. A message containing deeply nested structures or large repeated fields forces the server to allocate proportional memory during parsing.
At 4 MB per message, even a modest request rate of 100 requests per second pushes 400 MB/s of data that the server must buffer, deserialize, and process. This is not a bandwidth attack against the network pipe. It is a resource exhaustion attack against the application layer.
Mode 3: Deadline Abuse
Exploit: gRPC Deadline Propagation
gRPC has a built-in deadline mechanism. Clients set a deadline (timeout) for each RPC call, and the server is expected to abort processing when the deadline expires. This is a useful feature for normal operations. It becomes an attack vector when deliberately misused.
In deadline abuse mode, the tool sends requests with extremely short deadlines, on the order of 1-10 milliseconds. The server receives the request, begins processing (allocating resources, starting database queries, calling downstream services), and then the deadline expires. The server must now clean up the partially-completed work. But the resources were still consumed during the processing window.
If the server calls downstream gRPC services with the propagated deadline, those services also begin and abort their processing. A single deadline-abused request can trigger cascading partial work across multiple microservices in the chain.
Mode 4: Metadata Flooding
Exploit: gRPC Metadata (HTTP/2 Headers)
gRPC metadata is transmitted as HTTP/2 headers. Unlike HTTP/1.1 headers, HTTP/2 uses HPACK compression for header encoding, which means the server must decompress and process each header entry. gRPC services frequently read and process custom metadata for authentication, tracing, routing, and context propagation.
In metadata flood mode, the tool attaches a large number of metadata entries with large values to each request. The server must decompress, parse, and store these entries for the duration of the request. If the application code iterates over metadata (common in interceptors/middleware), the processing cost scales linearly with the metadata size.
HTTP/2 does have a SETTINGS_MAX_HEADER_LIST_SIZE parameter, but the default is 16 KB, and many servers do not enforce a custom limit. Sending 16 KB of metadata per request on a high-frequency stream can overwhelm the server's header processing pipeline.
Mode 5: Connection Flooding
Exploit: HTTP/2 Connection State
Every gRPC connection is a persistent HTTP/2 connection that carries server-side state: TLS session data, HPACK dynamic table, flow control windows, and stream tracking structures. Unlike HTTP/1.1 connections that are relatively lightweight, HTTP/2 connections are stateful and resource-intensive to maintain.
In connection flood mode, the tool rapidly establishes new HTTP/2 connections, completes the TLS handshake and HTTP/2 SETTINGS exchange, and then either holds the connection idle or sends minimal traffic on each one. The server must maintain the full connection state for each one.
The TLS handshake alone is expensive. Each new connection requires an ECDHE key exchange, certificate verification, and session establishment. A server handling 10,000 concurrent idle HTTP/2 connections consumes significant memory for connection state alone, even before any gRPC calls are made.
Why WAFs Cannot Help
To understand why these five attack modes bypass WAFs, consider what a WAF actually inspects and what it does not.
| Layer | WAF Visibility | gRPC Attack Surface |
|---|---|---|
| TCP/IP (L3-L4) | Full visibility | Connection flooding targets this layer |
| TLS | Terminates or passes through | TLS handshake cost exploited by connection flooding |
| HTTP/2 frames | Frame-level inspection | Stream multiplexing, SETTINGS abuse |
| HTTP/2 headers | Header inspection (HPACK-decoded) | Metadata flooding targets this layer |
| gRPC framing | Limited or none | Message size, deadline values |
| Protobuf payload | None | Large message content, malformed structures |
| gRPC semantics | None | Streaming patterns, deadline propagation, service method abuse |
The critical gap is in the bottom three rows. WAFs operating at the HTTP layer can see the HTTP/2 frames and headers, but they cannot parse the protobuf payload, understand the gRPC framing, or interpret gRPC-specific semantics like deadlines and streaming behavior.
The Protobuf Problem
To inspect a protobuf payload, a WAF would need the .proto schema definition for the service. Without it, protobuf is an opaque binary format. Even with the schema, the WAF would need to deserialize every message to inspect its contents, adding significant latency. No major WAF vendor currently offers protobuf-aware inspection as a standard feature.
Vendor-Specific Gaps
We tested how the four most common WAF vendors handle gRPC traffic. The results are consistent.
Cloudflare No gRPC payload inspection
Cloudflare supports gRPC passthrough (enabled per-domain in the Network tab). When enabled, gRPC traffic is proxied through Cloudflare's network with HTTP/2 support. Cloudflare's WAF rules can inspect HTTP/2 headers and apply rate limiting at the connection level. However, Cloudflare cannot parse protobuf payloads, enforce gRPC message size limits, or detect deadline abuse patterns. Stream multiplexing is handled at the HTTP/2 layer without gRPC-specific awareness.
AWS WAF HTTP-layer rules only
AWS WAF inspects HTTP request components: URI, headers, query strings, and body content. For gRPC traffic arriving through an Application Load Balancer or API Gateway, AWS WAF can inspect the HTTP/2 headers but treats the protobuf body as opaque binary. Rate-based rules count requests but cannot distinguish between unary calls and streaming messages within a single HTTP/2 connection. AWS does not offer gRPC-specific rule groups.
Imperva Partial HTTP/2 awareness
Imperva's WAF supports HTTP/2 and can apply security policies to HTTP/2 traffic. Their DDoS protection can detect volumetric patterns at the connection and request level. However, protobuf payload inspection is not documented in their public materials. Imperva's bot detection operates at the HTTP layer and does not account for gRPC client behaviors. Metadata flooding may be partially mitigated by header size limits, but gRPC-specific patterns like deadline abuse are not addressed.
Azure Front Door No gRPC support
Azure Front Door does not natively support gRPC passthrough. gRPC traffic must be routed through Azure Application Gateway or directly to the backend, bypassing Front Door's WAF entirely. When gRPC does pass through Azure's WAF (via Application Gateway), inspection is limited to HTTP/2 frame-level analysis without protobuf awareness. This makes Azure the most exposed of the four vendors for gRPC workloads.
Who Is Exposed?
gRPC adoption has accelerated rapidly over the past five years. The protocol is no longer limited to Google's internal services. It is a standard component of modern infrastructure.
Organizations running gRPC in production typically fall into these categories:
- Kubernetes-native companies: gRPC is the default inter-service communication protocol in many Kubernetes deployments. Service meshes like Istio, Linkerd, and Envoy use gRPC for both data plane and control plane communication.
- Mobile app backends: gRPC's efficient binary serialization and bidirectional streaming make it popular for mobile APIs where bandwidth and battery life matter. If your mobile app communicates with a backend, there is a meaningful chance it uses gRPC.
- Fintech and trading platforms: Low-latency requirements drive gRPC adoption in financial services. Order execution, market data streaming, and inter-service calls in trading systems frequently use gRPC.
- IoT and telemetry: Devices streaming sensor data to cloud backends increasingly use gRPC for its efficient serialization and streaming capabilities.
- Public APIs: Google Cloud APIs, Buf Connect, and a growing number of third-party services offer gRPC endpoints alongside REST. Some offer gRPC exclusively.
The pattern is clear. Any organization that has adopted microservices in the past five years likely has gRPC in its stack. Many of these organizations protect their HTTP/REST endpoints with WAFs while leaving gRPC services either unprotected or protected by WAFs that cannot inspect gRPC traffic meaningfully.
The Internal Service Problem
Many gRPC services are internal, communicating between microservices within a cluster. But "internal" does not mean "unreachable." Misconfigured Kubernetes ingress controllers, overly permissive network policies, and service mesh misconfigurations can expose internal gRPC endpoints. Once an attacker reaches an internal gRPC service, all five attack modes apply with no WAF in the path at all.
What You Can Do
Since WAFs cannot solve this problem at the protocol level, defense must come from the gRPC layer itself. Here are the concrete steps that address each attack mode.
1. Enforce Message Size Limits
The default 4 MB max message size is almost certainly too large for your service. Set explicit limits based on your actual payload sizes.
// Go (grpc-go)
server := grpc.NewServer(
grpc.MaxRecvMsgSize(1 * 1024 * 1024), // 1 MB max receive
grpc.MaxSendMsgSize(1 * 1024 * 1024), // 1 MB max send
)
// Java (grpc-java)
ServerBuilder.forPort(port)
.maxInboundMessageSize(1 * 1024 * 1024)
.build();
If your largest legitimate message is 50 KB, set the limit to 100 KB. There is no reason to accept 4 MB messages if your service never produces or consumes them.
2. Set and Enforce Deadlines
Reject requests with unreasonably short deadlines before processing begins. Implement server-side deadline validation in an interceptor.
// Go interceptor example
func deadlineInterceptor(
ctx context.Context,
req interface{},
info *grpc.UnaryServerInfo,
handler grpc.UnaryHandler,
) (interface{}, error) {
deadline, ok := ctx.Deadline()
if ok {
remaining := time.Until(deadline)
if remaining < 50*time.Millisecond {
return nil, status.Errorf(
codes.InvalidArgument,
"deadline too short: %v", remaining,
)
}
}
return handler(ctx, req)
}
This stops deadline abuse at the door. If a request arrives with a 5ms deadline, the server rejects it immediately without starting any processing.
3. Limit Concurrent Streams
Control the HTTP/2 MAX_CONCURRENT_STREAMS setting to prevent stream flooding.
// Go
server := grpc.NewServer(
grpc.MaxConcurrentStreams(50), // Limit concurrent streams per connection
)
The default of 100 concurrent streams per connection is generous. For most services, 20-50 concurrent streams per connection is sufficient. Reducing this limit forces attackers to open more connections to achieve the same level of concurrency, making the attack more visible to network-level monitoring.
4. Implement gRPC-Aware Rate Limiting
Rate limit at the gRPC method level, not just the connection level. Different methods have different cost profiles. A ListOrders call is more expensive than a GetStatus call and should have a lower rate limit.
// Per-method rate limiting in interceptor
var methodLimits = map[string]rate.Limit{
"/api.OrderService/ListOrders": 10, // 10 req/s
"/api.OrderService/GetOrder": 100, // 100 req/s
"/api.OrderService/GetStatus": 200, // 200 req/s
}
This is more effective than connection-level rate limiting because it accounts for the actual cost of each operation. A stream flood sending cheap requests is very different from a stream flood sending expensive queries.
5. Control Connection Counts
Limit the number of concurrent connections from a single source and implement connection-level keepalive enforcement.
// Go - keepalive enforcement
server := grpc.NewServer(
grpc.KeepaliveEnforcementPolicy(keepalive.EnforcementPolicy{
MinTime: 10 * time.Second, // Minimum ping interval
PermitWithoutStream: false, // Disconnect idle connections
}),
grpc.KeepaliveParams(keepalive.ServerParameters{
MaxConnectionIdle: 5 * time.Minute, // Close idle connections
MaxConnectionAge: 30 * time.Minute, // Force reconnection periodically
MaxConnectionAgeGrace: 5 * time.Second, // Grace period for in-flight RPCs
}),
)
Forcing periodic reconnection limits the damage from connection flooding and prevents individual connections from accumulating excessive state.
6. Limit Metadata Size
Set explicit limits on the size and count of metadata entries.
// Go
server := grpc.NewServer(
grpc.MaxHeaderListSize(8192), // 8 KB max header/metadata size
)
The default of 16 KB is more than most services need. Reducing this limit directly mitigates metadata flooding attacks.
Service Mesh as a Defense Layer
If you run Envoy, Istio, or Linkerd as a service mesh sidecar, you have an additional enforcement point. These proxies understand gRPC and can enforce rate limits, connection limits, and circuit breakers at the gRPC level. Envoy in particular supports per-method rate limiting, max concurrent streams, and connection-level policies. Using your service mesh as a gRPC-aware security layer is often more effective than trying to make your WAF understand gRPC.
The Complete Defense Matrix
Here is every attack mode mapped to the specific defense that mitigates it.
| Attack Mode | Primary Defense | WAF Effective? | Difficulty to Implement |
|---|---|---|---|
| Stream Flooding | MaxConcurrentStreams + per-method rate limiting | No | Low |
| Large Message | MaxRecvMsgSize per service | No | Low |
| Deadline Abuse | Minimum deadline interceptor | No | Medium |
| Metadata Flooding | MaxHeaderListSize + metadata validation | Partial | Low |
| Connection Flooding | Keepalive enforcement + connection limits | Partial | Low |
Notice the pattern. For three of five attack modes, WAFs provide no protection at all. For the remaining two, WAFs can partially help because the attack surface is at the HTTP/2 layer (headers and connections), where WAFs do have some visibility. But even for those two, gRPC-level defenses are more precise and effective.
The Bigger Picture
gRPC's DDoS blind spot is a symptom of a broader problem. Security tooling consistently lags protocol adoption. When the industry moved from HTTP/1.1 to HTTP/2, WAFs took years to add full HTTP/2 support. When REST APIs became the primary attack surface, WAFs were still optimized for HTML form submissions.
Now the industry is moving from REST to gRPC, from text-based to binary protocols, from request-response to streaming communication patterns. Security tooling has not caught up. The gap between what WAFs can inspect and what modern services actually speak is widening, not narrowing.
This is not a criticism of WAF vendors. Parsing protobuf at line rate without the schema definition is a genuinely hard problem. But it means that organizations adopting gRPC must take ownership of application-layer security themselves, rather than relying on infrastructure-level tools that were designed for a different protocol era.
The most dangerous security gap is the one between the protocol your application speaks and the protocol your security tools understand.
Is Your gRPC Infrastructure Exposed?
DDactic's free infrastructure scan identifies exposed gRPC endpoints, missing rate limits, default message size configurations, and WAF blind spots. We test what your WAF cannot see.
Get a Free Scan