Beyond Subdomain Discovery: The 7-Stage Attack Surface Pipeline

April 2026 | Stav Barak | 14 min read | Security Engineering

Finding every subdomain is the easy part. Knowing what to do with them is where most scanners stop and most security teams get stuck.

In a previous post, we broke down how DDactic queries 13 intelligence sources to discover subdomains and validates them with AI. That process, stage 1 of our pipeline, typically produces 40 to 200 verified assets per organization.

But a list of subdomains is not an attack surface assessment. It is a phone book. You still need to know what is running on each host, how it is protected, whether credentials have leaked, what vulnerabilities exist, and which assets an attacker would target first.

This post covers stages 2 through 7: the six stages that transform a list of domains into a prioritized, actionable test plan.

7
Pipeline Stages
30+
Ports Scanned
1000+
Technology Signatures
10K+
Vulnerability Templates

The Full Pipeline at a Glance

Each stage consumes the output of the previous one. Nothing runs in isolation, and nothing is wasted. Here is the complete flow:

Asset Discovery
Subdomain enumeration across 25+ sources, AI validation, SLD expansion. Covered in the previous post.
Port Scanning
Tier-aware socket scanning with CDN filtering. Reveals services behind every discovered host.
L7 Reconnaissance
HTTP fingerprinting, technology detection (1000+ signatures), WAF identification, protocol probing.
Breach Database Integration
Credential exposure lookups via HIBP and multiple breach intelligence feeds. Email harvesting.
Active Reconnaissance
Sensitive path probing, deep crawling, vulnerability template matching, cloud storage discovery.
AI-Powered Analysis
Automated asset classification, priority scoring, protection gap identification.
Test Plan Generation
Maps all findings to specific attack vectors, techniques, and a prioritized testing schedule.

The entire pipeline runs in 5 to 15 minutes for a single company, depending on the size of the attack surface. Results stream to the dashboard in real time as each stage completes.

2 Port Scanning

Stage 1 gives us a list of hostnames. Stage 2 answers a different question: what services are actually running on these hosts?

A subdomain that only serves HTTPS on port 443 presents a very different risk profile than one running SSH on port 22, a database admin panel on port 8080, and an unprotected API on port 3000. You cannot assess the attack surface without knowing what ports are open.

Tier-Aware Scanning

Not every scan needs the same depth. We use a tiered approach based on the customer's plan and the nature of the target:

Tier Ports Scanned What It Catches
Basic 5 ports (80, 443, 8080, 8443, 22) Web services and SSH
Standard 13 ports (+ 21, 25, 53, 110, 3306, 5432, 3389, 6379) Databases, mail, FTP, RDP, Redis
Full 30+ ports (+ SIP, DNS, custom app ports, high-range services) VoIP, game servers, IoT, custom services

CDN Filtering

Here is a subtlety that most port scanners miss entirely. When a hostname resolves to a CDN IP address (Cloudflare, Akamai, Fastly), scanning that IP's ports tells you about the CDN, not about the target. Port 80 and 443 are open because the CDN is listening, not because the origin server has those ports exposed.

Our scanner detects CDN-proxied hosts and filters them from port scan results. This eliminates false positives and avoids wasting time scanning infrastructure that belongs to a third party. The CDN layer gets its own analysis in stage 3.

Why This Matters for DDoS

Open ports that bypass CDN protection are direct paths to the origin server. An exposed database port or an API running on a non-standard port often has no DDoS mitigation at all. These are the assets that go down first.

What Open Ports Reveal

Port scan results feed directly into the next stages. Finding port 3306 (MySQL) or 5432 (PostgreSQL) open on a public IP means the database is internet-facing, likely without WAF protection. Port 6379 (Redis) with no authentication is a critical finding. Port 22 (SSH) tells us there is direct server access that could be targeted with brute-force or used as a DDoS vector against the authentication layer.

The port scan does not just enumerate services. It builds the topology map that every subsequent stage depends on.

3 L7 Reconnaissance

Knowing that port 443 is open tells you very little. Stage 3 probes the application layer to answer: what software is running, how is it configured, and what protection sits in front of it?

HTTP Fingerprinting

For every HTTP-serving asset, the pipeline collects:

Technology Detection

The scanner matches responses against over 1,000 technology signatures to identify:

Technology identification is not academic. A WordPress site with known plugin vulnerabilities is a different risk than a static site on Vercel. A Spring Boot API behind Nginx without rate limiting is a different target than one behind a managed API gateway.

WAF and CDN Identification

This is where L7 recon becomes directly relevant to DDoS resilience. For every asset, we determine:

The Configuration Gap

Having a WAF is not the same as having a properly configured WAF. We frequently find organizations with enterprise-grade WAF subscriptions where rate limiting is disabled, bot management is in log-only mode, or DDoS protection thresholds are set so high they never trigger. Stage 3 detects these configuration gaps. For a deeper look at this problem, read our WAF configuration analysis.

Multi-Protocol Probing

L7 reconnaissance is not limited to HTTP. The pipeline also probes:

Each protocol has its own DDoS attack vectors. A DNS server vulnerable to amplification, a mail server without rate limiting, or an exposed SIP gateway can each be leveraged for service disruption. The pipeline identifies these per-protocol risks rather than treating every asset as "just a web server."

4 Breach Database Integration

This stage often surprises people. Why does a DDoS resilience platform check breach databases?

Because credential exposure is attack surface. And it is the part of the attack surface that firewalls, CDNs, and WAFs cannot see.

What We Check

The pipeline queries multiple breach intelligence sources, including Have I Been Pwned (HIBP) and several commercial breach monitoring feeds. For each target organization, we:

  1. Harvest email addresses associated with the organization's domains through passive sources (search engines, public directories, certificate transparency logs)
  2. Check each address against breach databases to determine whether credentials have been exposed
  3. Correlate breached accounts with discovered assets to identify which services those credentials could access

Why Breach Data Matters for DDoS

Consider this scenario: an organization has invested heavily in Cloudflare Enterprise for their public website, AWS Shield Advanced for their API, and a managed scrubbing service for their network layer. Their perimeter looks solid.

But 340 employee email addresses appeared in a data breach two years ago. Some of those employees still use the same passwords. Now an attacker can:

Shadow IT Discovery

Breach data also reveals services that the security team may not know exist. When employee credentials appear in breach dumps associated with third-party SaaS tools, development platforms, or personal projects hosted on company domains, it often surfaces shadow IT that was never included in the organization's asset inventory.

The Correlation Step

Raw breach counts are not useful on their own. The value comes from correlating breach data with the assets discovered in stages 1-3. If we found an exposed VPN portal in stage 1 and 200 breached employee credentials in stage 4, those findings together represent a much higher risk than either one alone.

This correlation happens automatically. By the time the pipeline reaches stage 6 (AI analysis), it has both the infrastructure topology and the credential exposure data needed to assess combined risk.

5 Active Reconnaissance

Stages 2-4 are largely passive: they probe, fingerprint, and query external databases. Stage 5 shifts to active testing, carefully probing each asset for exploitable conditions.

Sensitive Path Discovery

The scanner probes for paths that should not be publicly accessible:

Like port scanning, path discovery is tier-aware. Basic scans check a focused list of high-signal paths. Full scans probe hundreds of paths informed by the technology stack detected in stage 3: if we identified WordPress, we check WordPress-specific paths; if we found a Spring Boot app, we probe Spring Actuator endpoints.

Vulnerability Template Matching

The pipeline runs over 10,000 vulnerability detection templates against each asset. These templates perform passive detection, identifying known vulnerabilities by their response signatures without sending exploit payloads. This includes:

Cloud Storage Discovery

Many organizations have misconfigured cloud storage buckets (S3, Azure Blob, GCP Cloud Storage) that are publicly accessible. The active recon stage checks for storage resources associated with the target's domain names, brand names, and known project identifiers. A publicly readable backup bucket is both a data breach risk and an indicator of broader infrastructure hygiene problems.

Deep Crawl and Measurement

For assets that serve web content, the pipeline performs a constrained crawl to map the application structure, discover additional endpoints, and measure response characteristics under normal load. These baseline measurements become the reference point for stage 7's test plan, where we need to know what "normal" looks like before we can define what "under stress" means.

Controlled and Scoped

Active reconnaissance never sends exploit payloads, never attempts to modify data, and never exceeds the scope defined by the domain ownership verification in the customer's account. It probes for the existence of vulnerabilities through response analysis, not through exploitation.

6 AI-Powered Analysis

By stage 6, the pipeline has accumulated a substantial dataset: hundreds of subdomains, port scan results, technology fingerprints, WAF detection data, breach exposure counts, vulnerability findings, and crawl data. A human analyst could spend hours reviewing this. The AI analysis stage processes it in under 30 seconds.

Asset Classification

The first AI task is classification. Every discovered asset gets labeled with its role in the organization's infrastructure:

Classification Examples DDoS Relevance
Customer-facing portal my.company.com, app.company.com High, direct revenue impact
API endpoint api.company.com, gateway.company.com Critical, often bypasses CDN cache
Internal tool jenkins.company.com, grafana.company.com Medium, operational disruption
Marketing site www.company.com, blog.company.com Lower, usually CDN-cached
Parked/defensive registration company-typo.com, companyx.com None, filtered from results

This classification step is what separates a useful assessment from a noisy one. Without it, a security team receives a flat list of 150 domains and has to manually determine which ones matter. With it, they immediately see that 8 are customer portals, 12 are API endpoints, 6 are internal tools exposed to the internet, and 40 are parked domains that can be safely ignored.

Filtering Parked Domains and Defensive Registrations

Organizations often register dozens of domain variants (typosquatting protection, brand defense, future projects) that resolve to parking pages or redirect to the main site. Including these in a security assessment adds noise without adding value.

The AI identifies parked domains by combining signals: hosting on known parking services, identical redirect destinations, absence of unique content, WHOIS registration patterns consistent with defensive registration. These domains are flagged and filtered so the assessment focuses on assets that actually carry risk.

Priority Scoring

Each asset receives a priority score based on multiple factors:

The output is a ranked list. Not "here are 150 things to worry about," but "here are the 12 assets that should keep you up at night, in order."

Protection Gap Identification

The AI cross-references what it knows about the organization's defensive posture with what a complete defense should look like. Common gaps it identifies:

AI Cost

The entire AI analysis stage, including asset classification, priority scoring, and gap identification, costs under $0.02 per scan. It uses fast inference models optimized for structured data analysis, not large language models generating creative text. Speed matters as much as accuracy: the analysis adds less than 30 seconds to the pipeline runtime.

7 Test Plan Generation

The final stage transforms all findings into a concrete test plan. This is the output that security teams actually act on.

From Findings to Attack Vectors

The test plan generator maps each finding to specific attack techniques. This is not a generic list of "things that could go wrong." It is a tailored mapping based on what the pipeline actually observed:

Finding: API endpoint at api.company.com:443
         - No CDN protection detected
         - Rate limiting: none observed
         - Technology: Node.js/Express
         - Authentication: API key (header)

Mapped Test Vectors:
  1. HTTP flood (direct to origin, no CDN absorption)
  2. Slowloris (Node.js single-threaded event loop)
  3. API abuse (expensive query patterns)
  4. Authentication endpoint stress
     (brute-force rate with no limiting)

Priority: CRITICAL
Reason: Revenue-generating API with no L7 protection

Attack Vector Selection

DDactic maintains a matrix of over 200 distinct attack vectors across multiple protocol layers and architecture types. The test plan generator does not select from this matrix randomly. It uses the data from all previous stages to determine which vectors are relevant:

Prioritized Test Schedule

The test plan is not just a list of vectors. It is a prioritized schedule that tells the security team (or DDactic's automated testing infrastructure) what to test first and why:

  1. Critical assets without protection - API endpoints and customer portals serving traffic directly from origin servers
  2. Assets with misconfigured protection - WAF in detection-only mode, rate limits set too high, bot detection disabled
  3. Assets with credential exposure - Services where breached credentials could bypass perimeter defenses
  4. Assets with known vulnerabilities - Software versions with published DDoS-relevant CVEs
  5. Properly protected assets - Testing that defenses actually work as configured under realistic load

Actionable Output

The test plan includes specific remediation recommendations for each finding. These are not generic advice like "enable rate limiting." They are vendor-specific CLI commands and configuration changes based on the exact technology stack detected in stage 3. If you are running Cloudflare, you get Cloudflare commands. If you are behind AWS WAF, you get AWS WAF rules.

Why Sequential Stages Matter

You might wonder: why not run everything in parallel and save time?

Because each stage depends on the output of the previous ones, and that dependency is what makes the results useful.

A flat, parallel scan that checks subdomains, ports, and vulnerabilities independently produces a spreadsheet. A sequential pipeline that builds context at each stage produces an assessment.

What This Looks Like in Practice

Here is a simplified example of how the pipeline's stages compound to produce findings that no single stage could generate alone:

Stage 1: Discovers staging.company.com
Stage 2: Finds ports 443, 3000, 5432 open
Stage 3: Port 443 = React app, Port 3000 = Express API,
         Port 5432 = PostgreSQL. No WAF detected.
Stage 4: 12 developer credentials breached (company.com
         domain in 2024 breach)
Stage 5: /api-docs publicly accessible on port 3000,
         GraphQL introspection enabled, .env file
         exposed at /.env
Stage 6: AI classifies as "staging environment with
         production database connection" (priority: CRITICAL)
Stage 7: Test plan includes: direct DB connection test,
         API abuse via documented endpoints, credential
         stuffing against developer accounts

Combined finding: Staging environment with production
data, no perimeter defense, full API documentation
public, developer credentials compromised.

No single stage produces this conclusion.
All seven together do.

The Gap Between Discovery and Assessment

Most attack surface management tools stop at discovery. They give you a list of assets, maybe with some port information and basic fingerprinting. That is valuable, but it is the beginning of the work, not the end.

The gap between "here are your assets" and "here is what an attacker would do with them" is where organizations are most vulnerable. Security teams receive asset inventories and then have to manually determine risk, prioritize remediation, and design test plans. That manual process takes weeks, and by the time it is complete, the attack surface has changed.

DDactic's 7-stage pipeline closes that gap automatically. From a company name to a prioritized, actionable test plan in under 15 minutes.

See Your Full Attack Surface

Run a free scan and see what all 7 stages discover about your organization. No account required. Results in minutes, not weeks.

Start a Free Scan
Attack Surface Reconnaissance Port Scanning WAF Detection Breach Data AI Analysis DDoS Resilience Vulnerability Assessment Security Pipeline