The Dangling DNS Trap: Automating Subdomain Takeover Detection in Multi-Cloud Infrastructure with Python and Terraform

Prevent subdomain takeovers in multi-cloud environments. Learn to detect dangling DNS records using Python and Terraform to secure your infrastructure.

The Dangling DNS Trap: Automating Subdomain Takeover Detection in Multi-Cloud Infrastructure with Python and Terraform
Photo by Growtika on Unsplash

Imagine this scenario: Your marketing team launches a campaign on promo.yourbrand.com hosted on an AWS S3 bucket. The campaign ends, and a junior DevOps engineer deletes the S3 bucket to save costs but forgets to delete the Route53 CNAME record pointing to it. Six months later, a threat actor scans the internet, finds your "dangling" DNS record, creates a new S3 bucket with the exact same name, and suddenly, promo.yourbrand.com is serving malware or a phishing site to your customers. This is a Subdomain Takeover.

In the era of multi-cloud infrastructure, where resources are spun up and torn down dynamically across AWS, Azure, and Google Cloud, maintaining DNS hygiene is no longer a manual task—it is a significant security challenge. For CTOs and IT leaders, the risk isn't just technical; it's a direct threat to brand reputation and data integrity.

At Nohatek, we believe that security should be as automated as your deployment pipelines. In this guide, we will explore the mechanics of the dangling DNS trap and demonstrate how to leverage Terraform and Python to build an automated detection engine that keeps your multi-cloud environment secure.

The Anatomy of a Subdomain Takeover

A white and blue chain on a black background
Photo by Mohamed Marey on Unsplash

To understand the solution, we must first dissect the vulnerability. A subdomain takeover occurs when a DNS record (usually a CNAME) points to a cloud resource that no longer exists (or has been de-provisioned). This creates a "dangling" pointer.

Cloud providers typically assign canonical names to resources. For example:

  • AWS S3: bucketname.s3-website.us-east-1.amazonaws.com
  • Azure App Service: appname.azurewebsites.net
  • GitHub Pages: username.github.io

When you configure your corporate DNS (e.g., app.nohatek.com) to point to one of these external services via a CNAME, you establish a chain of trust. If the destination resource is deleted but the CNAME remains, that chain is broken. Because cloud providers often allow resources to be claimed on a first-come, first-served basis, an attacker can simply register the missing resource name on the provider's platform. The result? Your legitimate domain now serves their content.

The Business Impact: Beyond defacement, attackers can use taken-over subdomains to steal session cookies (via XSS), bypass Cross-Origin Resource Sharing (CORS) policies, or launch highly convincing phishing campaigns that bypass spam filters because the email originates from a trusted domain.

Why Multi-Cloud Exacerbates the Problem

diagram
Photo by GuerrillaBuzz on Unsplash

In a monolithic, on-premise environment, DNS was relatively static. In a modern multi-cloud architecture, the velocity of change is high. Development teams might use AWS for compute, Azure for Active Directory integration, and Heroku for rapid prototyping. This fragmentation creates visibility gaps.

Consider the complexity of Infrastructure as Code (IaC) drift. While tools like Terraform are excellent at managing state, they aren't always aware of external changes. If a developer manually deletes an Azure App Service via the portal to debug an issue, Terraform might still think the resource exists—or worse, the DNS record might be managed in a completely different Terraform state file than the application resource.

Manual auditing in this environment is impossible. You cannot expect a human operator to check thousands of DNS records against the live status of resources across three different cloud providers daily. This is where we must shift from configuration management to continuous validation.

Strategy: Extracting Truth from Terraform State

black laptop computer turned on displaying blue screen
Photo by Mohammad Rahmani on Unsplash

The first step in automation is gathering a list of all DNS records your infrastructure claims to own. If you are using Terraform to manage your DNS (e.g., AWS Route53, Cloudflare, or Azure DNS), your terraform.tfstate file is a goldmine of data.

We can use the terraform show -json command to export the infrastructure state into a machine-readable format. This allows us to parse every DNS record defined in our code programmatically.

Here is a conceptual look at how we approach the data extraction:

# Export current state to JSON
terraform show -json > infrastructure.json

Within this JSON, we are looking specifically for resources with types like aws_route53_record, azurerm_dns_cname_record, or google_dns_record_set. We are interested in the name (the subdomain) and the records (the target it points to). By parsing this file, we generate a "Source of Truth" list of domains that need to be audited.

Implementing the Detection Engine with Python

A snake rests, coiled on a dark surface.
Photo by Magdalena Grabowska on Unsplash

Once we have the list of domains from Terraform, we need a Python script to act as the auditor. The logic for the script is straightforward but powerful:

  1. Ingest: Read the Terraform JSON output.
  2. Resolve: Perform a DNS lookup on each CNAME to see where it points.
  3. Validate: Check the HTTP status of the target resource.

If a CNAME resolves to a known cloud provider domain (like azurewebsites.net), but the HTTP request to that target returns a 404 Not Found or an NXDOMAIN error, we have a high-probability candidate for a subdomain takeover.

Here is a simplified Python snippet using dnspython and requests to demonstrate the core logic:

import dns.resolver
import requests

def check_dangling_dns(subdomain, target):
    try:
        # Step 1: Verify the CNAME resolution
        answers = dns.resolver.resolve(subdomain, 'CNAME')
        canonical_name = str(answers[0].target)
        
        # Step 2: Check for Cloud Provider Signatures
        if "s3.amazonaws.com" in canonical_name or "azurewebsites.net" in canonical_name:
            
            # Step 3: Check HTTP Status
            response = requests.get(f"http://{subdomain}", timeout=5)
            
            # A 404 from the provider often indicates the resource is unclaimed
            if response.status_code == 404:
                return True, f"POTENTIAL TAKEOVER: {subdomain} points to unclaimed {canonical_name}"
                
    except Exception as e:
        return False, str(e)

    return False, "Secure"

By integrating this script into your CI/CD pipeline (e.g., running it as a post-apply step in GitHub Actions or Jenkins), you create a continuous feedback loop. If a developer deletes a resource but leaves the DNS, the pipeline fails or sends a Slack alert immediately.

Pro Tip: For enterprise-grade implementations, you should maintain a list of "fingerprints." Different providers have different error messages when a resource is missing. For example, GitHub Pages will return "There isn't a GitHub Pages site here," while Heroku returns "No such app." Your Python script should look for these specific response bodies to reduce false positives.

The "Dangling DNS" trap is a silent vulnerability that grows in direct proportion to the complexity of your cloud infrastructure. In a multi-cloud world, relying on manual housekeeping is a strategy for failure. By combining the state-management capabilities of Terraform with the flexibility of Python automation, you can turn a reactive security posture into a proactive one.

At Nohatek, we specialize in building resilient, secure cloud architectures that scale with your business. Whether you need to audit your existing DNS landscape or build a ground-up DevSecOps pipeline, our team is ready to help you close the gaps in your infrastructure.

Don't leave your brand open to hijacking. Contact Nohatek today to secure your digital footprint.