Beyond Subdomain Discovery: The 7-Stage Attack Surface Pipeline

Finding every subdomain is the easy part. Knowing what to do with them is where most scanners stop and most security teams get stuck.

In a previous post, we broke down how DDactic queries 13 intelligence sources to discover subdomains and validates them with AI. That process, stage 1 of our pipeline, typically produces 40 to 200 verified assets per organization.

But a list of subdomains is not an attack surface assessment. It is a phone book. You still need to know what is running on each host, how it is protected, whether credentials have leaked, what vulnerabilities exist, and which assets an attacker would target first.

This post covers stages 2 through 7: the six stages that transform a list of domains into a prioritized, actionable test plan.

Pipeline Stages

30+

Ports Scanned

1000+

Technology Signatures

10K+

Vulnerability Templates

The Full Pipeline at a Glance

Each stage consumes the output of the previous one. Nothing runs in isolation, and nothing is wasted. Here is the complete flow:

Asset Discovery

Subdomain enumeration across 25+ sources, AI validation, SLD expansion. Covered in the previous post.

Port Scanning

Tier-aware socket scanning with CDN filtering. Reveals services behind every discovered host.

L7 Reconnaissance

HTTP fingerprinting, technology detection (1000+ signatures), WAF identification, protocol probing.

Breach Database Integration

Credential exposure lookups via HIBP and multiple breach intelligence feeds. Email harvesting.

Active Reconnaissance

Sensitive path probing, deep crawling, vulnerability template matching, cloud storage discovery.

AI-Powered Analysis

Automated asset classification, priority scoring, protection gap identification.

Test Plan Generation

Maps all findings to specific attack vectors, techniques, and a prioritized testing schedule.

The entire pipeline runs in 5 to 15 minutes for a single company, depending on the size of the attack surface. Results stream to the dashboard in real time as each stage completes.

2 Port Scanning

Stage 1 gives us a list of hostnames. Stage 2 answers a different question: what services are actually running on these hosts?

A subdomain that only serves HTTPS on port 443 presents a very different risk profile than one running SSH on port 22, a database admin panel on port 8080, and an unprotected API on port 3000. You cannot assess the attack surface without knowing what ports are open.

Tier-Aware Scanning

Not every scan needs the same depth. We use a tiered approach based on the customer's plan and the nature of the target:

Tier	Ports Scanned	What It Catches
Basic	5 ports (80, 443, 8080, 8443, 22)	Web services and SSH
Standard	13 ports (+ 21, 25, 53, 110, 3306, 5432, 3389, 6379)	Databases, mail, FTP, RDP, Redis
Full	30+ ports (+ SIP, DNS, custom app ports, high-range services)	VoIP, game servers, IoT, custom services

CDN Filtering

Here is a subtlety that most port scanners miss entirely. When a hostname resolves to a CDN IP address (Cloudflare, Akamai, Fastly), scanning that IP's ports tells you about the CDN, not about the target. Port 80 and 443 are open because the CDN is listening, not because the origin server has those ports exposed.

Our scanner detects CDN-proxied hosts and filters them from port scan results. This eliminates false positives and avoids wasting time scanning infrastructure that belongs to a third party. The CDN layer gets its own analysis in stage 3.

Why This Matters for DDoS

Open ports that bypass CDN protection are direct paths to the origin server. An exposed database port or an API running on a non-standard port often has no DDoS mitigation at all. These are the assets that go down first.

What Open Ports Reveal

Port scan results feed directly into the next stages. Finding port 3306 (MySQL) or 5432 (PostgreSQL) open on a public IP means the database is internet-facing, likely without WAF protection. Port 6379 (Redis) with no authentication is a critical finding. Port 22 (SSH) tells us there is direct server access that could be targeted with brute-force or used as a DDoS vector against the authentication layer.

The port scan does not just enumerate services. It builds the topology map that every subsequent stage depends on.

3 L7 Reconnaissance

Knowing that port 443 is open tells you very little. Stage 3 probes the application layer to answer: what software is running, how is it configured, and what protection sits in front of it?

HTTP Fingerprinting

For every HTTP-serving asset, the pipeline collects:

Response headers: Server type, framework identifiers, caching behavior, security headers (or lack thereof)
TLS certificate details: Issuer, validity, SANs (Subject Alternative Names that often reveal additional hostnames)
Response characteristics: Status codes, redirect chains, response sizes, timing
Login detection: Whether the page contains authentication forms, OAuth flows, or API key input fields

Technology Detection

The scanner matches responses against over 1,000 technology signatures to identify:

Web frameworks (React, Angular, Next.js, Django, Rails, Spring)
CMS platforms (WordPress, Drupal, Joomla, Contentful)
Cloud hosting platforms (AWS, Azure, GCP, Vercel, Netlify)
Load balancers and reverse proxies (Nginx, HAProxy, Envoy, Traefik)
Analytics, tracking, and third-party scripts

Technology identification is not academic. A WordPress site with known plugin vulnerabilities is a different risk than a static site on Vercel. A Spring Boot API behind Nginx without rate limiting is a different target than one behind a managed API gateway.

WAF and CDN Identification

This is where L7 recon becomes directly relevant to DDoS resilience. For every asset, we determine:

Which WAF vendor (if any) fronts the asset
Which CDN provider serves the content
Whether the WAF is in detection-only mode or actively blocking
Whether the origin IP is discoverable despite CDN protection (see our CDN bypass post)

The Configuration Gap

Having a WAF is not the same as having a properly configured WAF. We frequently find organizations with enterprise-grade WAF subscriptions where rate limiting is disabled, bot management is in log-only mode, or DDoS protection thresholds are set so high they never trigger. Stage 3 detects these configuration gaps. For a deeper look at this problem, read our WAF configuration analysis.

Multi-Protocol Probing

L7 reconnaissance is not limited to HTTP. The pipeline also probes:

DNS: Recursive resolution behavior, zone transfer attempts, DNSSEC validation
SMTP: Mail server configuration, open relay detection, SPF/DKIM/DMARC records
SIP: VoIP infrastructure exposure (increasingly common in enterprise environments)
Direct-to-Router (D2R): Probing for network devices accessible from the public internet

Each protocol has its own DDoS attack vectors. A DNS server vulnerable to amplification, a mail server without rate limiting, or an exposed SIP gateway can each be leveraged for service disruption. The pipeline identifies these per-protocol risks rather than treating every asset as "just a web server."

4 Application Recon

Stages 1-3 scan the perimeter from the outside. Stage 4 goes inside the application to surface attack vectors that passive scanning cannot see.

DDactic's app-labs system intercepts traffic from web, desktop, and mobile clients to map the full API surface as a real user experiences it -- not just the endpoints visible from an unauthenticated external probe.

What Application Recon Finds

Mobile-only endpoints: API routes called exclusively by the mobile app, not documented, not reachable from a browser -- but reachable by an attacker who has reverse-engineered the app
WebSocket connections that bypass WAF rate limiting because they establish a persistent session rather than per-request flows
Authenticated API flows the public-facing CDN never inspects -- endpoints that only appear after login, where bot detection and rate limiting are often absent
Desktop client gRPC streams not reflected in any public schema, exposing services that have no WAF coverage
Sensitive path discovery: configuration files, backup files, admin interfaces, debug endpoints, and cloud storage buckets reachable without authentication

Vulnerability Template Matching

The pipeline runs over 10,000 vulnerability detection templates against each asset. These templates perform passive detection, identifying known vulnerabilities by their response signatures without sending exploit payloads. This includes CVEs in detected software versions, misconfigured security headers, information disclosure through error pages, default credentials on management interfaces, and exposed API endpoints with missing authentication.

Controlled and Scoped

Application Recon never sends exploit payloads, never attempts to modify data, and never exceeds the scope defined by the domain ownership verification in the customer's account. It maps attack surface through traffic observation and response analysis, not through exploitation.

5 Breach DB & OSINT

This stage often surprises people. Why does a DDoS resilience platform check breach databases?

Because credential exposure is attack surface. And it is the part of the attack surface that firewalls, CDNs, and WAFs cannot see.

What We Check

The pipeline queries multiple breach intelligence sources: Have I Been Pwned (HIBP), DeHashed, LeakCheck, and LeakIX. For each target organization, we:

Harvest email addresses associated with the organization's domains through passive sources (search engines, public directories, certificate transparency logs)
Check each address against breach databases to determine whether credentials have been exposed
Correlate breached accounts with login endpoints and admin panels discovered in stages 1-4 to identify which services those credentials could access

Why Breach Data Matters for DDoS

Consider this scenario: an organization has invested heavily in Cloudflare Enterprise for their public website, AWS Shield Advanced for their API, and a managed scrubbing service for their network layer. Their perimeter looks solid.

But 340 employee email addresses appeared in a data breach two years ago. Some of those employees still use the same passwords. Now an attacker can:

Authenticate to the VPN portal and access internal services that have zero DDoS protection
Log in to admin panels discovered in stage 1 that sit behind the CDN but have no rate limiting for authenticated users
Access API endpoints with valid tokens, bypassing bot detection and rate limiting that only applies to unauthenticated traffic
Reach internal dashboards (Grafana, Jenkins, Kibana) that are internet-facing but rely on authentication as their only defense

Shadow IT Discovery

Breach data also reveals services that the security team may not know exist. When employee credentials appear in breach dumps associated with third-party SaaS tools, development platforms, or personal projects hosted on company domains, it often surfaces shadow IT that was never included in the organization's asset inventory.

The Correlation Step

Raw breach counts are not useful on their own. The value comes from correlating breach data with the assets discovered in stages 1-4. If we found an exposed VPN portal in stage 1 and 200 breached employee credentials in stage 5, those findings together represent a much higher risk than either one alone.

This correlation happens automatically. By the time the pipeline reaches stage 6 (AI analysis), it has both the infrastructure topology and the credential exposure data needed to assess combined risk.

6 AI-Powered Analysis

By stage 6, the pipeline has accumulated a substantial dataset: hundreds of subdomains, port scan results, technology fingerprints, WAF detection data, breach exposure counts, vulnerability findings, and crawl data. A human analyst could spend hours reviewing this. The AI analysis stage processes it in under 30 seconds.

Asset Classification

The first AI task is classification. Every discovered asset gets labeled with its role in the organization's infrastructure:

Classification	Examples	DDoS Relevance
Customer-facing portal	my.company.com, app.company.com	High, direct revenue impact
API endpoint	api.company.com, gateway.company.com	Critical, often bypasses CDN cache
Internal tool	jenkins.company.com, grafana.company.com	Medium, operational disruption
Marketing site	www.company.com, blog.company.com	Lower, usually CDN-cached
Parked/defensive registration	company-typo.com, companyx.com	None, filtered from results

This classification step is what separates a useful assessment from a noisy one. Without it, a security team receives a flat list of 150 domains and has to manually determine which ones matter. With it, they immediately see that 8 are customer portals, 12 are API endpoints, 6 are internal tools exposed to the internet, and 40 are parked domains that can be safely ignored.

Filtering Parked Domains and Defensive Registrations

Organizations often register dozens of domain variants (typosquatting protection, brand defense, future projects) that resolve to parking pages or redirect to the main site. Including these in a security assessment adds noise without adding value.

The AI identifies parked domains by combining signals: hosting on known parking services, identical redirect destinations, absence of unique content, WHOIS registration patterns consistent with defensive registration. These domains are flagged and filtered so the assessment focuses on assets that actually carry risk.

Priority Scoring

Each asset receives a priority score based on multiple factors:

Business impact: Customer portals and revenue-generating APIs rank higher than internal tools
Protection level: Assets without WAF or CDN protection rank higher than those behind enterprise-grade defenses
Exposure indicators: Open non-standard ports, missing security headers, known vulnerabilities
Credential risk: Assets where breached credentials could provide authenticated access
Technology risk: Outdated software versions, known-vulnerable frameworks

The output is a ranked list. Not "here are 150 things to worry about," but "here are the 12 assets that should keep you up at night, in order."

Protection Gap Identification

The AI cross-references what it knows about the organization's defensive posture with what a complete defense should look like. Common gaps it identifies:

API endpoints served directly from origin servers while marketing sites are CDN-protected
Rate limiting configured for the main domain but missing on subdomains
WAF rules in detection-only mode (logging but not blocking)
DDoS protection thresholds set above realistic attack volumes
Authentication endpoints without bot detection or challenge mechanisms

AI Cost

The entire AI analysis stage, including asset classification, priority scoring, and gap identification, costs under $0.02 per scan. It uses fast inference models optimized for structured data analysis, not large language models generating creative text. Speed matters as much as accuracy: the analysis adds less than 30 seconds to the pipeline runtime.

7 Test Plan Generation

The final stage transforms all findings into a concrete test plan. This is the output that security teams actually act on.

From Findings to Attack Vectors

The test plan generator maps each finding to specific attack techniques. This is not a generic list of "things that could go wrong." It is a tailored mapping based on what the pipeline actually observed:

Finding: API endpoint at api.company.com:443
         - No CDN protection detected
         - Rate limiting: none observed
         - Technology: Node.js/Express
         - Authentication: API key (header)

Mapped Test Vectors:
  1. HTTP flood (direct to origin, no CDN absorption)
  2. Slowloris (Node.js single-threaded event loop)
  3. API abuse (expensive query patterns)
  4. Authentication endpoint stress
     (brute-force rate with no limiting)

Priority: CRITICAL
Reason: Revenue-generating API with no L7 protection

Attack Vector Selection

DDactic maintains a matrix of over 200 distinct attack vectors across multiple protocol layers and architecture types. The test plan generator does not select from this matrix randomly. It uses the data from all previous stages to determine which vectors are relevant:

Protocol layer: HTTP/1.1, HTTP/2, HTTP/3, DNS, SIP, SMTP, raw TCP/UDP
Architecture type: Direct-to-origin, CDN-proxied, load-balanced, API gateway, serverless
Defense posture: Unprotected, WAF-only, CDN+WAF, full scrubbing
Technology stack: Framework-specific vectors (e.g., Spring4Shell for Spring Boot, ReDoS for regex-heavy parsers)

Prioritized Test Schedule

The test plan is not just a list of vectors. It is a prioritized schedule that tells the security team (or DDactic's automated testing infrastructure) what to test first and why:

Critical assets without protection - API endpoints and customer portals serving traffic directly from origin servers
Assets with misconfigured protection - WAF in detection-only mode, rate limits set too high, bot detection disabled
Assets with credential exposure - Services where breached credentials could bypass perimeter defenses
Assets with known vulnerabilities - Software versions with published DDoS-relevant CVEs
Properly protected assets - Testing that defenses actually work as configured under realistic load

Actionable Output

The test plan includes specific remediation recommendations for each finding. These are not generic advice like "enable rate limiting." They are vendor-specific CLI commands and configuration changes based on the exact technology stack detected in stage 3. If you are running Cloudflare, you get Cloudflare commands. If you are behind AWS WAF, you get AWS WAF rules.

Why Sequential Stages Matter

You might wonder: why not run everything in parallel and save time?

Because each stage depends on the output of the previous ones, and that dependency is what makes the results useful.

Port scanning (stage 2) needs stage 1's output to know which hosts to scan and which are CDN-proxied (and should be filtered)
L7 recon (stage 3) needs stage 2's output to know which ports to probe for HTTP services, and which services run non-HTTP protocols
Active recon (stage 5) needs stage 3's output to select the right vulnerability templates based on detected technology stacks
AI analysis (stage 6) needs everything to classify assets accurately and identify protection gaps
Test plan generation (stage 7) needs everything to map findings to the correct attack vectors for each specific architecture

A flat, parallel scan that checks subdomains, ports, and vulnerabilities independently produces a spreadsheet. A sequential pipeline that builds context at each stage produces an assessment.

What This Looks Like in Practice

Here is a simplified example of how the pipeline's stages compound to produce findings that no single stage could generate alone:

Stage 1: Discovers staging.company.com
Stage 2: Finds ports 443, 3000, 5432 open
Stage 3: Port 443 = React app, Port 3000 = Express API,
         Port 5432 = PostgreSQL. No WAF detected.
Stage 4: 12 developer credentials breached (company.com
         domain in 2024 breach)
Stage 5: /api-docs publicly accessible on port 3000,
         GraphQL introspection enabled, .env file
         exposed at /.env
Stage 6: AI classifies as "staging environment with
         production database connection" (priority: CRITICAL)
Stage 7: Test plan includes: direct DB connection test,
         API abuse via documented endpoints, credential
         stuffing against developer accounts

Combined finding: Staging environment with production
data, no perimeter defense, full API documentation
public, developer credentials compromised.

No single stage produces this conclusion.
All seven together do.

The Gap Between Discovery and Assessment

Most attack surface management tools stop at discovery. They give you a list of assets, maybe with some port information and basic fingerprinting. That is valuable, but it is the beginning of the work, not the end.

The gap between "here are your assets" and "here is what an attacker would do with them" is where organizations are most vulnerable. Security teams receive asset inventories and then have to manually determine risk, prioritize remediation, and design test plans. That manual process takes weeks, and by the time it is complete, the attack surface has changed.

DDactic's 7-stage pipeline closes that gap automatically. From a company name to a prioritized, actionable test plan in under 15 minutes.

See Your Full Attack Surface

Run a free scan and see what all 7 stages discover about your organization. No account required. Results in minutes, not weeks.

Start a Free Scan

Attack Surface Reconnaissance Port Scanning WAF Detection Breach Data AI Analysis DDoS Resilience Vulnerability Assessment Security Pipeline