Detect Any Website's Tech Stack with Python (Under 50 Lines)
You need to programmatically detect what technologies a website uses—for competitive research, security auditing, or lead qualification. Here's how to do it in Python with under 50 lines of code.
By the end of this tutorial, you'll have working Python scripts that detect technologies for a single domain, batch-scan hundreds of domains from a CSV file, filter results by category, and export everything to a spreadsheet-ready CSV.
Prerequisites
You need two things:
- Python 3.7+ with the
requestslibrary installed - A DetectZeStack API key from RapidAPI (free, no credit card required)
Install the requests library if you don't have it:
pip install requests
Then grab your API key:
Sign up on RapidAPI, subscribe to DetectZeStack's free plan, and copy your API key from the Code Snippets panel. You'll use it in the x-rapidapi-key header for every request.
Basic Example: Detect Technologies for a Single Domain
Here's the simplest possible script. It sends a domain to the DetectZeStack API and prints every detected technology:
import requests
url = "https://detectzestack.p.rapidapi.com/detect"
querystring = {"url": "stripe.com"}
headers = {
"x-rapidapi-key": "YOUR_API_KEY",
"x-rapidapi-host": "detectzestack.p.rapidapi.com"
}
response = requests.get(url, headers=headers, params=querystring)
data = response.json()
print(f"Domain: {data['domain']}")
print(f"Technologies found: {len(data['technologies'])}")
for tech in data['technologies']:
print(f" - {tech['name']} ({tech['categories'][0]}) [{tech['source']}]")
Output:
Domain: stripe.com
Technologies found: 5
- Amazon S3 (CDN) [dns]
- Amazon Web Services (PaaS) [dns]
- DigiCert (SSL/TLS certificate authority) [tls]
- HSTS (Security) [headers]
- Nginx (Web servers) [headers]
Each technology includes its name, categories, confidence score, and source—how it was detected. The source field tells you whether the detection came from DNS records, TLS certificates, HTTP headers, or HTML/JavaScript analysis. This is something most detection tools don't tell you.
Batch Scanning: Process Domains from a CSV
Scanning one domain at a time is fine for testing. In practice, you'll want to scan a list. Here's a script that reads domains from a CSV file, scans each one, and handles rate limiting:
First, create a file called domains.csv:
domain
stripe.com
shopify.com
github.com
notion.so
vercel.com
Then run this script:
import requests
import csv
import time
API_KEY = "YOUR_API_KEY"
API_URL = "https://detectzestack.p.rapidapi.com/detect"
HEADERS = {
"x-rapidapi-key": API_KEY,
"x-rapidapi-host": "detectzestack.p.rapidapi.com"
}
def scan_domain(domain):
"""Scan a single domain and return the result."""
response = requests.get(
API_URL,
headers=HEADERS,
params={"url": domain},
timeout=30
)
response.raise_for_status()
return response.json()
# Read domains from CSV
with open("domains.csv", "r") as f:
reader = csv.DictReader(f)
domains = [row["domain"].strip() for row in reader]
print(f"Scanning {len(domains)} domains...\n")
for domain in domains:
try:
result = scan_domain(domain)
tech_count = len(result.get("technologies", []))
print(f"{result['domain']}: {tech_count} technologies detected")
except requests.exceptions.HTTPError as e:
print(f"{domain}: Error {e.response.status_code}")
except Exception as e:
print(f"{domain}: {e}")
# Respect rate limits: 1 request per second
time.sleep(1)
The time.sleep(1) adds a one-second delay between requests. This keeps you well within rate limits and avoids 429 errors. For the free plan (100 requests/month), this is more than sufficient. On paid plans with higher limits, you can reduce the delay to 0.5 seconds.
Filtering by Category
The API returns all detected technologies. If you only care about specific categories—say, you want to find which CMS each site uses, or which CDN—you can filter the results in Python:
import requests
API_KEY = "YOUR_API_KEY"
API_URL = "https://detectzestack.p.rapidapi.com/detect"
HEADERS = {
"x-rapidapi-key": API_KEY,
"x-rapidapi-host": "detectzestack.p.rapidapi.com"
}
def get_technologies_by_category(domain, target_categories):
"""Return only technologies matching the given categories."""
response = requests.get(
API_URL,
headers=HEADERS,
params={"url": domain},
timeout=30
)
response.raise_for_status()
data = response.json()
matched = []
for tech in data.get("technologies", []):
for cat in tech.get("categories", []):
if cat.lower() in [c.lower() for c in target_categories]:
matched.append(tech)
break
return matched
# Example: find only CDN and analytics technologies
domain = "shopify.com"
categories = ["CDN", "Analytics", "JavaScript frameworks"]
results = get_technologies_by_category(domain, categories)
print(f"\n{domain} - filtered by {categories}:")
for tech in results:
print(f" {tech['name']} ({', '.join(tech['categories'])})")
Common categories you can filter by: CDN, CMS, Analytics, JavaScript frameworks, Web servers, PaaS, Security, SSL/TLS certificate authority, Ecommerce, and Marketing automation.
Export Results to CSV
For analysis in Excel or Google Sheets, export your scan results to CSV. This script scans a list of domains and writes one row per technology detected:
import requests
import csv
import time
API_KEY = "YOUR_API_KEY"
API_URL = "https://detectzestack.p.rapidapi.com/detect"
HEADERS = {
"x-rapidapi-key": API_KEY,
"x-rapidapi-host": "detectzestack.p.rapidapi.com"
}
domains = ["stripe.com", "shopify.com", "github.com", "notion.so"]
with open("tech_stacks.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["domain", "technology", "categories", "confidence", "source"])
for domain in domains:
try:
response = requests.get(
API_URL,
headers=HEADERS,
params={"url": domain},
timeout=30
)
response.raise_for_status()
data = response.json()
for tech in data.get("technologies", []):
writer.writerow([
data["domain"],
tech["name"],
"; ".join(tech.get("categories", [])),
tech.get("confidence", ""),
tech.get("source", "")
])
print(f"OK: {data['domain']} ({len(data['technologies'])} techs)")
except Exception as e:
writer.writerow([domain, "ERROR", str(e), "", ""])
print(f"FAIL: {domain} - {e}")
time.sleep(1)
print("\nResults saved to tech_stacks.csv")
The output CSV will look like this:
domain,technology,categories,confidence,source
stripe.com,Amazon S3,CDN,100,dns
stripe.com,Amazon Web Services,PaaS,100,dns
stripe.com,DigiCert,SSL/TLS certificate authority,70,tls
stripe.com,HSTS,Security,100,headers
stripe.com,Nginx,Web servers; Reverse proxies,100,headers
shopify.com,Cloudflare,CDN,100,dns
...
Open this in Excel or Google Sheets and you can pivot by category, sort by confidence, or filter by detection source. The source column is especially useful: technologies detected via dns or tls are infrastructure-layer signals that most detection tools miss entirely.
Error Handling
Production scripts need to handle failures gracefully. Here are the common error cases and how to deal with them:
import requests
import time
API_KEY = "YOUR_API_KEY"
API_URL = "https://detectzestack.p.rapidapi.com/detect"
HEADERS = {
"x-rapidapi-key": API_KEY,
"x-rapidapi-host": "detectzestack.p.rapidapi.com"
}
def scan_with_retries(domain, max_retries=3):
"""Scan a domain with retry logic for transient errors."""
for attempt in range(max_retries):
try:
response = requests.get(
API_URL,
headers=HEADERS,
params={"url": domain},
timeout=30
)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
# Rate limited: wait and retry
wait_time = 2 ** attempt # Exponential backoff: 1s, 2s, 4s
print(f" Rate limited on {domain}, waiting {wait_time}s...")
time.sleep(wait_time)
continue
elif response.status_code == 422:
# Invalid domain or URL
print(f" Invalid domain: {domain}")
return None
else:
print(f" HTTP {response.status_code} for {domain}")
return None
except requests.exceptions.Timeout:
print(f" Timeout scanning {domain} (attempt {attempt + 1})")
time.sleep(1)
continue
except requests.exceptions.ConnectionError:
print(f" Connection error for {domain} (attempt {attempt + 1})")
time.sleep(2)
continue
print(f" Failed after {max_retries} retries: {domain}")
return None
# Usage
domains = ["stripe.com", "not-a-real-domain.xyz", "github.com"]
for domain in domains:
result = scan_with_retries(domain)
if result:
techs = result.get("technologies", [])
print(f"{result['domain']}: {len(techs)} technologies")
time.sleep(1)
Key points:
- 429 (Rate Limited) — the API returns this if you exceed your plan's rate limit. Use exponential backoff: wait 1 second, then 2, then 4.
- 422 (Invalid Input) — the domain couldn't be resolved or the URL format is invalid. Skip it and move on.
- Timeouts — set a 30-second timeout on each request. Some domains take longer to scan because the API resolves DNS, inspects TLS certificates, and fetches HTTP responses.
- Connection errors — transient network issues. Retry with a short delay.
Putting It All Together
Here's the complete workflow in under 50 lines. This script reads domains from a CSV, scans each one with error handling, and writes results to an output CSV:
import requests, csv, time
API_KEY = "YOUR_API_KEY"
URL = "https://detectzestack.p.rapidapi.com/detect"
HDR = {"x-rapidapi-key": API_KEY, "x-rapidapi-host": "detectzestack.p.rapidapi.com"}
with open("domains.csv") as fin:
domains = [r["domain"].strip() for r in csv.DictReader(fin)]
with open("results.csv", "w", newline="") as fout:
w = csv.writer(fout)
w.writerow(["domain", "technology", "categories", "confidence", "source"])
for domain in domains:
try:
resp = requests.get(URL, headers=HDR, params={"url": domain}, timeout=30)
if resp.status_code == 429:
time.sleep(5); resp = requests.get(URL, headers=HDR, params={"url": domain}, timeout=30)
resp.raise_for_status()
data = resp.json()
for t in data.get("technologies", []):
w.writerow([data["domain"], t["name"], "; ".join(t.get("categories",[])), t.get("confidence",""), t.get("source","")])
print(f"OK {data['domain']}: {len(data['technologies'])} techs")
except Exception as e:
w.writerow([domain, "ERROR", str(e), "", ""])
print(f"ERR {domain}: {e}")
time.sleep(1)
print("Done. Results in results.csv")
Tip: The source field in each technology tells you how it was detected—dns, tls, headers, or html. Technologies detected via DNS and TLS are infrastructure signals (CDN, hosting, certificate authority) that browser-based tools like Wappalyzer and WhatRuns cannot see. This is what makes API-based detection more comprehensive than browser extensions.
Related Reading
- Detect Any Website Tech Stack — Single API call walkthrough
- Lead Enrichment Pipeline — Build a prospecting pipeline with tech detection
- Website Technology Checker API — Full API reference and use cases
- Track Technology Changes Over Time — Monitor tech stack changes across scans
Start Detecting Tech Stacks with Python
100 requests/month free. No credit card required. DNS + TLS + HTTP detection in a single API call.
Get Your API Key