Detect Any Website's Tech Stack with Python (Under 50 Lines)

March 18, 2026 · 8 min read

You need to programmatically detect what technologies a website uses—for competitive research, security auditing, or lead qualification. Here's how to do it in Python with under 50 lines of code.

By the end of this tutorial, you'll have working Python scripts that detect technologies for a single domain, batch-scan hundreds of domains from a CSV file, filter results by category, and export everything to a spreadsheet-ready CSV.

Prerequisites

You need two things:

  1. Python 3.7+ with the requests library installed
  2. A DetectZeStack API key from RapidAPI (free, no credit card required)

Install the requests library if you don't have it:

pip install requests

Then grab your API key:

Get your API key — 100 requests/month free

Sign up on RapidAPI, subscribe to DetectZeStack's free plan, and copy your API key from the Code Snippets panel. You'll use it in the x-rapidapi-key header for every request.

Basic Example: Detect Technologies for a Single Domain

Here's the simplest possible script. It sends a domain to the DetectZeStack API and prints every detected technology:

import requests

url = "https://detectzestack.p.rapidapi.com/detect"
querystring = {"url": "stripe.com"}
headers = {
    "x-rapidapi-key": "YOUR_API_KEY",
    "x-rapidapi-host": "detectzestack.p.rapidapi.com"
}
response = requests.get(url, headers=headers, params=querystring)
data = response.json()

print(f"Domain: {data['domain']}")
print(f"Technologies found: {len(data['technologies'])}")
for tech in data['technologies']:
    print(f"  - {tech['name']} ({tech['categories'][0]}) [{tech['source']}]")

Output:

Domain: stripe.com
Technologies found: 5
  - Amazon S3 (CDN) [dns]
  - Amazon Web Services (PaaS) [dns]
  - DigiCert (SSL/TLS certificate authority) [tls]
  - HSTS (Security) [headers]
  - Nginx (Web servers) [headers]

Each technology includes its name, categories, confidence score, and source—how it was detected. The source field tells you whether the detection came from DNS records, TLS certificates, HTTP headers, or HTML/JavaScript analysis. This is something most detection tools don't tell you.

Batch Scanning: Process Domains from a CSV

Scanning one domain at a time is fine for testing. In practice, you'll want to scan a list. Here's a script that reads domains from a CSV file, scans each one, and handles rate limiting:

First, create a file called domains.csv:

domain
stripe.com
shopify.com
github.com
notion.so
vercel.com

Then run this script:

import requests
import csv
import time

API_KEY = "YOUR_API_KEY"
API_URL = "https://detectzestack.p.rapidapi.com/detect"
HEADERS = {
    "x-rapidapi-key": API_KEY,
    "x-rapidapi-host": "detectzestack.p.rapidapi.com"
}

def scan_domain(domain):
    """Scan a single domain and return the result."""
    response = requests.get(
        API_URL,
        headers=HEADERS,
        params={"url": domain},
        timeout=30
    )
    response.raise_for_status()
    return response.json()

# Read domains from CSV
with open("domains.csv", "r") as f:
    reader = csv.DictReader(f)
    domains = [row["domain"].strip() for row in reader]

print(f"Scanning {len(domains)} domains...\n")

for domain in domains:
    try:
        result = scan_domain(domain)
        tech_count = len(result.get("technologies", []))
        print(f"{result['domain']}: {tech_count} technologies detected")
    except requests.exceptions.HTTPError as e:
        print(f"{domain}: Error {e.response.status_code}")
    except Exception as e:
        print(f"{domain}: {e}")

    # Respect rate limits: 1 request per second
    time.sleep(1)

The time.sleep(1) adds a one-second delay between requests. This keeps you well within rate limits and avoids 429 errors. For the free plan (100 requests/month), this is more than sufficient. On paid plans with higher limits, you can reduce the delay to 0.5 seconds.

Filtering by Category

The API returns all detected technologies. If you only care about specific categories—say, you want to find which CMS each site uses, or which CDN—you can filter the results in Python:

import requests

API_KEY = "YOUR_API_KEY"
API_URL = "https://detectzestack.p.rapidapi.com/detect"
HEADERS = {
    "x-rapidapi-key": API_KEY,
    "x-rapidapi-host": "detectzestack.p.rapidapi.com"
}

def get_technologies_by_category(domain, target_categories):
    """Return only technologies matching the given categories."""
    response = requests.get(
        API_URL,
        headers=HEADERS,
        params={"url": domain},
        timeout=30
    )
    response.raise_for_status()
    data = response.json()

    matched = []
    for tech in data.get("technologies", []):
        for cat in tech.get("categories", []):
            if cat.lower() in [c.lower() for c in target_categories]:
                matched.append(tech)
                break
    return matched

# Example: find only CDN and analytics technologies
domain = "shopify.com"
categories = ["CDN", "Analytics", "JavaScript frameworks"]
results = get_technologies_by_category(domain, categories)

print(f"\n{domain} - filtered by {categories}:")
for tech in results:
    print(f"  {tech['name']} ({', '.join(tech['categories'])})")

Common categories you can filter by: CDN, CMS, Analytics, JavaScript frameworks, Web servers, PaaS, Security, SSL/TLS certificate authority, Ecommerce, and Marketing automation.

Export Results to CSV

For analysis in Excel or Google Sheets, export your scan results to CSV. This script scans a list of domains and writes one row per technology detected:

import requests
import csv
import time

API_KEY = "YOUR_API_KEY"
API_URL = "https://detectzestack.p.rapidapi.com/detect"
HEADERS = {
    "x-rapidapi-key": API_KEY,
    "x-rapidapi-host": "detectzestack.p.rapidapi.com"
}

domains = ["stripe.com", "shopify.com", "github.com", "notion.so"]

with open("tech_stacks.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["domain", "technology", "categories", "confidence", "source"])

    for domain in domains:
        try:
            response = requests.get(
                API_URL,
                headers=HEADERS,
                params={"url": domain},
                timeout=30
            )
            response.raise_for_status()
            data = response.json()

            for tech in data.get("technologies", []):
                writer.writerow([
                    data["domain"],
                    tech["name"],
                    "; ".join(tech.get("categories", [])),
                    tech.get("confidence", ""),
                    tech.get("source", "")
                ])
            print(f"OK: {data['domain']} ({len(data['technologies'])} techs)")

        except Exception as e:
            writer.writerow([domain, "ERROR", str(e), "", ""])
            print(f"FAIL: {domain} - {e}")

        time.sleep(1)

print("\nResults saved to tech_stacks.csv")

The output CSV will look like this:

domain,technology,categories,confidence,source
stripe.com,Amazon S3,CDN,100,dns
stripe.com,Amazon Web Services,PaaS,100,dns
stripe.com,DigiCert,SSL/TLS certificate authority,70,tls
stripe.com,HSTS,Security,100,headers
stripe.com,Nginx,Web servers; Reverse proxies,100,headers
shopify.com,Cloudflare,CDN,100,dns
...

Open this in Excel or Google Sheets and you can pivot by category, sort by confidence, or filter by detection source. The source column is especially useful: technologies detected via dns or tls are infrastructure-layer signals that most detection tools miss entirely.

Error Handling

Production scripts need to handle failures gracefully. Here are the common error cases and how to deal with them:

import requests
import time

API_KEY = "YOUR_API_KEY"
API_URL = "https://detectzestack.p.rapidapi.com/detect"
HEADERS = {
    "x-rapidapi-key": API_KEY,
    "x-rapidapi-host": "detectzestack.p.rapidapi.com"
}

def scan_with_retries(domain, max_retries=3):
    """Scan a domain with retry logic for transient errors."""
    for attempt in range(max_retries):
        try:
            response = requests.get(
                API_URL,
                headers=HEADERS,
                params={"url": domain},
                timeout=30
            )

            if response.status_code == 200:
                return response.json()

            elif response.status_code == 429:
                # Rate limited: wait and retry
                wait_time = 2 ** attempt  # Exponential backoff: 1s, 2s, 4s
                print(f"  Rate limited on {domain}, waiting {wait_time}s...")
                time.sleep(wait_time)
                continue

            elif response.status_code == 422:
                # Invalid domain or URL
                print(f"  Invalid domain: {domain}")
                return None

            else:
                print(f"  HTTP {response.status_code} for {domain}")
                return None

        except requests.exceptions.Timeout:
            print(f"  Timeout scanning {domain} (attempt {attempt + 1})")
            time.sleep(1)
            continue

        except requests.exceptions.ConnectionError:
            print(f"  Connection error for {domain} (attempt {attempt + 1})")
            time.sleep(2)
            continue

    print(f"  Failed after {max_retries} retries: {domain}")
    return None

# Usage
domains = ["stripe.com", "not-a-real-domain.xyz", "github.com"]

for domain in domains:
    result = scan_with_retries(domain)
    if result:
        techs = result.get("technologies", [])
        print(f"{result['domain']}: {len(techs)} technologies")
    time.sleep(1)

Key points:

Putting It All Together

Here's the complete workflow in under 50 lines. This script reads domains from a CSV, scans each one with error handling, and writes results to an output CSV:

import requests, csv, time

API_KEY = "YOUR_API_KEY"
URL = "https://detectzestack.p.rapidapi.com/detect"
HDR = {"x-rapidapi-key": API_KEY, "x-rapidapi-host": "detectzestack.p.rapidapi.com"}

with open("domains.csv") as fin:
    domains = [r["domain"].strip() for r in csv.DictReader(fin)]

with open("results.csv", "w", newline="") as fout:
    w = csv.writer(fout)
    w.writerow(["domain", "technology", "categories", "confidence", "source"])
    for domain in domains:
        try:
            resp = requests.get(URL, headers=HDR, params={"url": domain}, timeout=30)
            if resp.status_code == 429:
                time.sleep(5); resp = requests.get(URL, headers=HDR, params={"url": domain}, timeout=30)
            resp.raise_for_status()
            data = resp.json()
            for t in data.get("technologies", []):
                w.writerow([data["domain"], t["name"], "; ".join(t.get("categories",[])), t.get("confidence",""), t.get("source","")])
            print(f"OK  {data['domain']}: {len(data['technologies'])} techs")
        except Exception as e:
            w.writerow([domain, "ERROR", str(e), "", ""])
            print(f"ERR {domain}: {e}")
        time.sleep(1)
print("Done. Results in results.csv")

Tip: The source field in each technology tells you how it was detected—dns, tls, headers, or html. Technologies detected via DNS and TLS are infrastructure signals (CDN, hosting, certificate authority) that browser-based tools like Wappalyzer and WhatRuns cannot see. This is what makes API-based detection more comprehensive than browser extensions.

Related Reading

Start Detecting Tech Stacks with Python

100 requests/month free. No credit card required. DNS + TLS + HTTP detection in a single API call.

Get Your API Key

Get API updates and tech detection tips

Join the mailing list. No spam, unsubscribe anytime.