Sections

0/18

Related Guides

Writing the Blameless Postmortem: RCAs That Actually Drive Change

Production Engineering

45m

Distributed Systems Patterns

High-Level Design

40m

Quiz

← Back to Library

Production Engineering·Intermediate

Security for Engineers: OWASP, Secrets, Supply Chain, and Least Privilege

Security fundamentals for engineers: OWASP Top 10, secrets management with HashiCorp Vault and AWS Secrets Manager, supply chain hardening via SBOM and Sigstore/cosign, and least-privilege IAM with IRSA workload identity. Covers SSRF, SQL injection, IDOR, and mTLS.

60 min read 18 sections 8 interview questions

OWASP Top 10Supply Chain SecurityHashiCorp VaultAWS Secrets ManagerExternal Secrets OperatorSigstore cosignSLSA FrameworkAWS IRSAmTLS IstioSQL Injection PreventionSSRF Cloud MetadataIDOR Access ControlCycloneDX SBOMLeast Privilege IAM

Why Security Is an Engineering Discipline, Not a Security Team Problem

Security failures are almost never caused by a single, exotic attack. They are caused by routine engineering mistakes made at scale — a secret committed to a public repo, a dependency that wasn't patched for six months, an IAM role with * on everything because it was faster to write. The engineers involved were not negligent. They were operating under time pressure in systems that made the wrong path easier than the right path.

The 2022 Uber breach was not caused by a sophisticated zero-day. It was caused by an MFA fatigue attack on a contractor's credentials and a VPN that, once inside, had no further segmentation — meaning the attacker could reach everything. The 2021 Log4Shell vulnerability was not novel cryptography; it was a JNDI lookup feature that allowed remote code execution, lurking in a logging library that was a transitive dependency in roughly 30,000 open-source packages and countless proprietary applications. Most teams didn't even know they ran log4j.

The pattern is consistent: security failures are systems failures, not individual failures. The question interviewers are really asking when they probe security knowledge is: "Does this engineer build systems that make security the path of least resistance, or do they treat it as a checkbox?" Building a service that logs API keys in plaintext is not a mistake; it is a systems design failure — the logging infrastructure should have made key scrubbing automatic.

For FAANG-level engineers, security competence means three things: (1) knowing the common vulnerability classes well enough to avoid them in code review, (2) knowing how to architect systems so secrets never touch plaintext and credentials are always scoped and rotated, and (3) understanding the supply chain — the code you didn't write is often the vector.

IMPORTANT

What Interviewers Are Testing

L4/Mid signal: Knows OWASP exists, can explain SQL injection and why parameterized queries fix it, stores secrets in environment variables (passing — but read below), knows HTTPS matters.

L5/Senior signal: Distinguishes OWASP categories precisely (IDOR is broken access control, not auth failure). Explains why environment variables for secrets are still a risk (visible in ps aux, container spec, logs). Knows the difference between IAM roles and users. Has a rotation strategy for secrets. Can explain mTLS and when you need it.

Staff signal: Frames security as a systems problem. Talks about threat modeling at design time, not after deployment. Can explain SLSA levels and why the build pipeline is a trust boundary. Knows workload identity (IRSA, GCP Workload Identity) and why it eliminates a whole class of credential-leak risk. Has opinions about defense-in-depth vs. perimeter security.

OWASP Top 10 (2021) — Engineer's Reference

Rank	Category	What It Actually Means	Primary Fix	Why It Moved
#1	Broken Access Control	IDOR (Insecure Direct Object Reference), path traversal, missing authz checks on API endpoints — user A reads user B's data	Enforce object-level authorization on every read/write; never trust IDs from client	Up from #5 in 2017 — now most common finding in bug bounty programs
#2	Cryptographic Failures	MD5/SHA1 for passwords (rainbow-table-crackable), HTTP not HTTPS, TLS 1.0/1.1, hardcoded keys	bcrypt/Argon2 for passwords; enforce HTTPS everywhere; deprecate TLS <1.2	Renamed from 'Sensitive Data Exposure' — root cause is crypto, not exposure
#3	Injection	SQL injection, command injection (os.system), LDAP injection, template injection — attacker controls query structure	Parameterized queries; never concatenate user input into shell commands	Was #1 for a decade; still critical but tooling improved
#4	Insecure Design	Security was never modeled — no threat model at design phase, business logic bypasses, missing rate limits on auth endpoints	Threat model every new service; use design reviews with security lens	New category in 2021 — signals maturity shift toward design-time security
#5	Security Misconfiguration	Default credentials (admin/admin), verbose stack traces in prod, unnecessary features enabled, S3 bucket public by default	Infrastructure as code; CIS benchmarks; disable what you don't use	Includes XML External Entities (XXE) absorbed from 2017 list
#6	Vulnerable and Outdated Components	Running dependencies with known CVEs — Log4Shell was this category. ~80% of codebases have components with known CVEs	SBOM + Dependabot/Renovate; scan with Trivy/Snyk in CI	Log4Shell (Dec 2021) made this category viscerally real for every team
#7	Identification and Authentication Failures	Brute force, credential stuffing (2B+ leaked credentials), session fixation, no MFA on admin interfaces	Rate-limit auth; invalidate sessions on login; require MFA for privileged access	Renamed from 'Broken Authentication' — broadened scope
#8	Software and Data Integrity Failures	Untrusted deserialization, unsigned artifacts in CI/CD, auto-update without integrity check — attacker injects into build pipeline	Sign artifacts with Sigstore/cosign; verify signatures before deploy	New category; SolarWinds supply chain attack exemplifies this
#9	Security Logging and Monitoring Failures	No audit trail for sensitive operations (who deleted that S3 bucket?), logs scrubbed of context needed for forensics, no alerting on brute force	Structured audit logs for all auth/authz events; CloudTrail; SIEM alerts	Was #10 in 2017 — harder to exploit directly but enables every other attack
#10	Server-Side Request Forgery (SSRF)	Service fetches attacker-controlled URL, which points to cloud metadata endpoint (169.254.169.254) — leaks IAM credentials from EC2 instance profile	Allowlist permitted URLs/hostnames; block metadata endpoint range in egress rules	New in 2021; cloud-native environments made this dramatically more dangerous

Injection Attacks: Why Parameterized Queries Are the Only Fix

SQL injection is the attack that never dies because the underlying cause is simple and keeps being reinvented: user-controlled input is concatenated into a query that a database engine then interprets as code. The fix — parameterized queries — has been available since the 1990s. The vulnerability persists because it is trivially easy to write a query the wrong way, especially when you inherit code or use an ORM that exposes raw query escape hatches.

The critical insight most explanations miss: input sanitization is not the fix for SQL injection. Sanitizing user input (stripping quotes, escaping special characters) fails because: (1) context-dependent escaping is hard to get right across character sets and database engines; (2) ORM bypass patterns (e.g., passing a RawSQL expression) re-introduce the vulnerability; (3) an attacker who controls only part of the input can often still break out of sanitization logic.

Parameterized queries (also called prepared statements) fix this at the architectural level. The query structure — the code — is compiled first. The user data is bound to it as typed values after the structure is fixed. The database engine can never interpret the data as code because the parse tree is already set.

Command injection follows the same pattern. os.system(f"convert {user_filename}") is vulnerable if user_filename is "; rm -rf / #". The fix is the same principle: use subprocess with shell=False and pass arguments as a list, so the shell never gets to interpret the argument as a command.

LDAP injection is the less-discussed cousin: if you build an LDAP filter with string concatenation ((&(uid={user_input})(active=true))), an attacker who submits *)(|(uid=*) can bypass authentication entirely by rewriting the filter logic. The fix is LDAP-specific escaping via your LDAP library's built-in methods — and, better, avoid accepting raw user-controlled values in LDAP queries at all.

SQL Injection: Vulnerable vs. Parameterized vs. ORM

pythondb_queries.py

import psycopg2
from sqlalchemy import text
from sqlalchemy.orm import Session

# ----------------------------------------------------------------
# VULNERABLE: string concatenation — never do this
# Attacker submits: user_id = "1 OR 1=1 --"
# Resulting query: SELECT * FROM users WHERE id = 1 OR 1=1 --
# Returns every row in the users table.
# ----------------------------------------------------------------
def get_user_vulnerable(conn, user_id: str):
    cursor = conn.cursor()
    cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")  # DANGER
    return cursor.fetchone()

# ----------------------------------------------------------------
# CORRECT: parameterized query via psycopg2
# The %s placeholder is bound after parse. user_id can be anything;
# it is treated as a literal value, never as SQL code.
# ----------------------------------------------------------------
def get_user_safe(conn, user_id: str):
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
    return cursor.fetchone()

# ----------------------------------------------------------------
# CORRECT: SQLAlchemy ORM — safe by default
# ----------------------------------------------------------------
def get_user_orm(session: Session, user_id: int):
    return session.query(User).filter(User.id == user_id).first()

# ----------------------------------------------------------------
# TRAP: SQLAlchemy text() with string interpolation is STILL VULNERABLE.
# The ORM does not protect you if you bypass it with raw SQL.
# ----------------------------------------------------------------
def get_user_orm_trap(session: Session, user_id: str):
    # VULNERABLE — same as raw string concatenation
    return session.execute(text(f"SELECT * FROM users WHERE id = {user_id}"))

# CORRECT: bind parameters with text()
def get_user_orm_correct(session: Session, user_id: str):
    return session.execute(
        text("SELECT * FROM users WHERE id = :uid"),
        {"uid": user_id}
    )

# ----------------------------------------------------------------
# Command injection: subprocess with shell=False
# ----------------------------------------------------------------
import subprocess

def convert_image_vulnerable(filename: str):
    import os
    os.system(f"convert {filename} output.png")  # DANGER: shell injection

def convert_image_safe(filename: str):
    # shell=False (default): args are NOT interpreted by shell.
    # filename is passed as a literal argument to convert, not parsed.
    subprocess.run(["convert", filename, "output.png"], check=True, shell=False)

Broken Access Control: IDOR and Why Auth Middleware Isn't Enough

Broken access control is the #1 OWASP category for a reason: authentication and authorization are two different enforcement points, and most services only enforce the first. Authentication verifies who you are. Authorization verifies what you are allowed to do. A service that properly authenticates every request but then trusts the user_id in the URL to scope the response has no authorization.

IDOR (Insecure Direct Object Reference) is the canonical form. Your API exposes GET /api/invoices/12345. You authenticate the request — the token is valid. But if you don't verify that invoice 12345 belongs to the authenticated user, any authenticated user can read any invoice by iterating 12345, 12346, 12347. This is not a theoretical attack: IDOR is consistently the highest-value finding in bug bounty programs across every major platform.

The fix requires object-level authorization on every data access operation — not just on the route. A middleware check that verifies the JWT exists and is valid does not prevent IDOR. The check must happen at the point where you load the object, and it must verify ownership or permission. In practice: never fetch by ID alone; always fetch with WHERE id = ? AND owner_id = ? (or equivalent RBAC check).

Path traversal is the filesystem variant. If your service serves files by path (e.g., GET /files?path=report.pdf), an attacker who submits ../../etc/passwd can read arbitrary files on the server if you don't normalize and validate the path before opening it. The fix: normalize to an absolute path with os.path.realpath(), then verify the result starts with your allowed base directory.

Missing authorization on administrative endpoints is the subtler variant. A service might correctly gate user-facing endpoints but expose POST /admin/reset-password without any role check, relying instead on the assumption that "no one knows about this endpoint." Security through obscurity is not a control. Enumerate your endpoints in a security review and verify each one has an explicit authorization check.

⚠ WARNING

The IDOR Pattern That Kills Bug Bounty Programs

The most common IDOR pattern is a numeric sequential ID in a URL. If your invoice IDs are integers that increment by one (12345, 12346), an authenticated attacker needs exactly one valid ID to enumerate your entire dataset. Use UUIDs or cryptographically random identifiers (not sequential) for any resource that is access-controlled. This doesn't replace authorization checks — it just makes exploitation slower — but it eliminates the enumeration attack vector entirely.

Second pattern: actions that are access-controlled but queries that aren't. A service might correctly check "can this user delete this resource?" but forget to apply the same check to "can this user read this resource?" Audit both read and write paths separately.

Secrets Management: Why Environment Variables Are Not the Answer

The conventional wisdom — "don't hardcode secrets; put them in environment variables" — is a significant improvement over hardcoded values, but it is not the final answer. Environment variables are visible in ps aux output on Linux (before kernel 3.5), in container specs and Kubernetes pod definitions stored in etcd, in crash dumps and core files, in CI/CD logs if you echo them accidentally, and in any language's runtime inspection (e.g., os.environ in Python). They persist for the lifetime of the process and there is no built-in rotation mechanism.

The correct mental model for secrets: a secret should be a short-lived credential that is fetched just-in-time, used, and expires automatically. Long-lived static secrets that never rotate are a ticking time bomb — not if they leak, but when.

AWS Secrets Manager is the managed option for AWS workloads. It stores secrets encrypted with KMS, supports automatic rotation for RDS, Redshift, and Documentdb (via Lambda-backed rotation), and provides IAM-based access control so you can grant a specific IAM role access to a specific secret. When you rotate a secret, Secrets Manager updates both the stored value and the database simultaneously, then calls your application to verify the new credentials work before deprecating the old ones.

HashiCorp Vault is the cross-cloud option and more powerful for complex patterns. Vault's killer feature is dynamic secrets: instead of storing a static database password, Vault creates a new database credential with a TTL on every request, and the credential is automatically revoked when the TTL expires. The application never holds a long-lived credential. Vault also supports PKI (certificate issuance), SSH certificate authorities, and AWS IAM credential generation. For Kubernetes, Vault Agent Injector or the Vault Secrets Operator syncs secrets into pods as files or environment variables without the application needing to call Vault directly.

In Kubernetes: the correct pattern is the External Secrets Operator (ESO). ESO polls your secret store (AWS Secrets Manager, Vault, GCP Secret Manager) on a configurable interval and creates/updates Kubernetes Secrets automatically. Your pod reads a K8s Secret (which is mounted as a file or env var), but the source of truth is your secret store, not a value manually created in the cluster. When the secret rotates in the store, ESO propagates the new value within minutes without a redeploy.

What to scan for in your codebase: gitleaks and git-secrets scan git history for committed secrets — including commits that were "cleaned up" later. A secret committed and then removed is still in git history and is still compromised. Run these tools in a pre-commit hook and in CI. Trufflehog goes deeper: it scans git history, Docker images, and S3 buckets.

Secrets Access Patterns: Static vs. Dynamic vs. K8s External Secrets

pythonsecrets_patterns.py

import boto3
import json
import os

# ----------------------------------------------------------------
# ANTI-PATTERN: hardcoded secret — never do this.
# Committed to git, visible in code review, in every clone.
# ----------------------------------------------------------------
DB_PASSWORD = "hunter2"  # DANGER

# ----------------------------------------------------------------
# BETTER BUT INCOMPLETE: environment variable.
# Rotation requires redeploy. Visible in container spec / ps aux.
# ----------------------------------------------------------------
DB_PASSWORD = os.environ["DB_PASSWORD"]

# ----------------------------------------------------------------
# CORRECT: fetch from AWS Secrets Manager at startup.
# Access controlled by IAM role attached to the compute resource.
# No long-lived credentials needed — just an IAM role.
# ----------------------------------------------------------------
def get_secret(secret_name: str, region: str = "us-east-1") -> dict:
    """
    Fetches and parses a JSON secret from AWS Secrets Manager.
    The calling code's IAM role must have secretsmanager:GetSecretValue
    on this specific secret ARN — not on *.
    """
    client = boto3.client("secretsmanager", region_name=region)
    response = client.get_secret_value(SecretId=secret_name)
    return json.loads(response["SecretString"])

# Usage: IAM role grants access; no password in code or env var.
creds = get_secret("prod/myservice/db-credentials")
db_user = creds["username"]
db_pass = creds["password"]

# ----------------------------------------------------------------
# BEST for Kubernetes: External Secrets Operator (ESO)
# ExternalSecret CR — defined in your Helm chart / manifests.
# ESO syncs the secret from ASM into a K8s Secret every 1h.
# Pod reads the K8s Secret as a mounted file — zero SDK calls needed.
# ----------------------------------------------------------------

# external-secret.yaml (K8s manifest, not Python):
# apiVersion: external-secrets.io/v1beta1
# kind: ExternalSecret
# metadata:
#   name: db-credentials
# spec:
#   refreshInterval: 1h
#   secretStoreRef:
#     name: aws-secretsmanager
#     kind: ClusterSecretStore
#   target:
#     name: db-credentials   # name of resulting K8s Secret
#   data:
#     - secretKey: password
#       remoteRef:
#         key: prod/myservice/db-credentials
#         property: password

# ----------------------------------------------------------------
# ROTATION: never rely on a secret that has no TTL.
# Implement rotation via boto3 if you manage your own rotation lambda.
# ----------------------------------------------------------------
def rotate_secret(secret_id: str) -> None:
    """Trigger immediate rotation — call from ops tooling, not app code."""
    client = boto3.client("secretsmanager")
    client.rotate_secret(SecretId=secret_id)
    # Secrets Manager will call your rotation Lambda, update the secret,
    # and test connectivity before deprecating the old version.

Secrets Management Architecture: From Vault to Pod

Rendering diagram...

Supply Chain Security: SBOM, Sigstore, and SLSA After Log4Shell

Log4Shell (CVE-2021-44228) changed how the industry thinks about software supply chains. The vulnerability was in log4j, a Java logging library. The reason it was catastrophic — affecting an estimated 3 billion devices — is that most teams didn't know they ran log4j. It was a transitive dependency: your service depended on Library A, which depended on Library B, which depended on log4j. Your pom.xml didn't mention log4j at all, but it was in your runtime classpath.

SBOM (Software Bill of Materials) is the foundational response. An SBOM is a machine-readable inventory of every component in your software: direct dependencies, transitive dependencies, versions, licenses, and known vulnerabilities. The two dominant formats are CycloneDX (JSON, supported by OWASP) and SPDX (ISO standard, used by Linux Foundation). Tools like Syft generate SBOMs from container images and source trees. With an SBOM, a new CVE announcement triggers a query: "which of our services contains this component?" — answerable in minutes instead of days.

Dependabot and Renovate are the automation layer. Dependabot (GitHub native) and Renovate (configurable, multi-platform) automatically open PRs when a dependency has a newer version or a known CVE. The failure mode is PR fatigue: if Dependabot opens 50 PRs a week and the team auto-merges none of them, the tool creates noise without reducing risk. The correct configuration: auto-merge patch updates in non-breaking dependencies after CI passes; group minor updates into weekly batch PRs; surface major updates for human review.

Sigstore/cosign solves the container image provenance problem. After your CI builds an image and pushes it to ECR/GCR, how does your Kubernetes admission controller verify that the image running in prod is the one that came from your CI pipeline and not a compromised image pushed directly to the registry? Sigstore/cosign signs the image with an OIDC-backed ephemeral key tied to the CI pipeline identity. Your cluster's admission webhook (e.g., Kyverno or OPA Gatekeeper) verifies the signature before allowing the pod to schedule. An image without a valid signature from your CI system is rejected.

SLSA (Supply-chain Levels for Software Artifacts) is a framework from Google for hardening the entire build pipeline. It defines four levels: L1 — the build is documented (scripted build process, not manual); L2 — the build is version-controlled and generates provenance attestations (signed metadata about who built what from what source at what time); L3 — the build is hosted on a dedicated hardened build infrastructure with no human write access (e.g., GitHub Actions with protected branches); L4 — the build is hermetic and reproducible (same source always produces bit-for-bit identical output, no network access during build). Most organizations are at L0–L1. Reaching L2 with signed provenance already eliminates the SolarWinds attack pattern where attackers injected malicious code directly into the build server.

Supply Chain Trust Chain: From Commit to Running Container

Rendering diagram...

Least Privilege IAM: Roles, Users, and Workload Identity

The most common IAM mistake at every company is s3:* on *. It is written that way because it is fast. It runs in production for years because nothing breaks. And it means that any code running with that role — your application, your CI pipeline, any dependency it loads — can read, write, overwrite, or delete any object in any S3 bucket in your account. One SQL injection or SSRF vulnerability, and an attacker has exfiltration access to your entire data lake.

IAM roles vs. IAM users: IAM users have long-lived access key pairs (a static ACCESS_KEY_ID and SECRET_ACCESS_KEY) that are manually rotated (and often aren't). IAM roles have no long-lived credentials — they issue short-lived session tokens (default 1 hour, max 12 hours) via the AWS STS AssumeRole API. Use IAM roles everywhere. The only valid use of IAM users is for legacy systems that cannot assume roles (virtually none after 2020) and for the root account break-glass credential.

Workload identity eliminates credentials from the pod entirely. The pattern: (1) create a Kubernetes ServiceAccount for your application; (2) annotate it with the ARN of an IAM role; (3) configure the IAM role's trust policy to allow the specific K8s ServiceAccount in the specific namespace of the specific cluster to assume it (via AWS IRSA or GCP Workload Identity). The pod's containers automatically receive short-lived AWS credentials via the AWS SDK's credential chain — no secret to store, no rotation to manage, no credential visible in the pod spec or environment.

Scoping properly: the right mental model is "what is the minimum set of IAM actions on the minimum set of resources that this service needs to function?" For an API service that reads from a single DynamoDB table: dynamodb:GetItem, dynamodb:Query, dynamodb:PutItem on the specific table ARN — not dynamodb:* on *. For a Lambda that writes to SQS: sqs:SendMessage on the specific queue ARN. Enforce this with IAM Access Analyzer, which identifies policies that grant more access than is actually used.

Cross-account access: use IAM roles with cross-account trust policies. Never copy credentials between accounts. The pattern: Account A's service assumes a role in Account B that grants specific permissions in Account B. The session is time-limited and auditable in CloudTrail in both accounts.

IAM Patterns: Credential Risk and Recommended Approach

Pattern	Credential Lifetime	Rotation	Production Verdict	Use Instead
IAM User + Access Key in code	Permanent until manually deleted	Manual, typically never	Never — critical risk	IAM Role via IRSA / Instance Profile
IAM User + Access Key in env var	Permanent	Manual, infrequent	Avoid — violates rotation hygiene	IAM Role; External Secrets for legacy systems
IAM User + Access Key in Secrets Manager	Permanent (key), short (session)	ASM auto-rotate via Lambda	Acceptable bridge for legacy workloads	Migrate to IAM roles when feasible
EC2 Instance Profile	1–12 hours (auto-refreshed)	Automatic via IMDS	Good for EC2 workloads	Scope the role tightly to this service
EKS IRSA (IAM Roles for Service Accounts)	1 hour (auto-refreshed)	Automatic via OIDC token exchange	Best for Kubernetes	One IAM role per K8s ServiceAccount per namespace
GCP Workload Identity	1 hour (auto-refreshed)	Automatic via GKE metadata server	Best for GKE	Bind to specific K8s ServiceAccount + namespace
Broad wildcard policy (s3:* on *)	N/A — applies to the role	N/A	Never in prod — blast radius is your entire account	Resource-specific ARNs; use IAM Access Analyzer

Authentication vs. Authorization: mTLS for Service-to-Service

TLS (Transport Layer Security) authenticates the server to the client — when your browser connects to api.example.com, the TLS handshake verifies that the server holds the private key for the certificate signed by a trusted CA. The client is not authenticated. This is appropriate for public-facing APIs where any browser can connect.

mTLS (mutual TLS) authenticates both sides. The server presents a certificate; the client presents a certificate; both verify each other before the connection is established. This is the correct pattern for service-to-service communication inside a cluster. Without mTLS, any service inside the cluster can send requests to any other service claiming to be anything. A compromised pod can impersonate your authentication service or your billing service.

The challenge historically was certificate management: manually issuing, distributing, and rotating certificates for every service pair is operationally impossible at scale. Service meshes (Istio, Linkerd) solve this by handling mTLS transparently via sidecar proxies (Envoy in Istio's case). The sidecar intercepts all inbound and outbound traffic, handles the mTLS handshake using certificates issued by the mesh's internal CA, and rotates certificates automatically (Istio default: 24-hour cert lifetime). The application code sees plaintext — it knows nothing about the TLS layer. Istio's control plane (Istiod) issues SPIFFE-format certificates (Secure Production Identity Framework for Everyone) that encode the workload's identity as a URI: spiffe://cluster.local/ns/payments/sa/payment-service.

TLS version and cipher suite configuration matters for external-facing services. TLS 1.0 and 1.1 are deprecated and broken (POODLE, BEAST attacks). TLS 1.2 is the minimum acceptable; TLS 1.3 is preferred (faster handshake, only forward-secret cipher suites). In nginx/Caddy/ALB, explicitly set ssl_protocols TLSv1.2 TLSv1.3 and use a cipher suite that excludes NULL, EXPORT, DES, 3DES, RC4, and MD5 ciphers.

SSRF: The Cloud-Native Attack That Steals IAM Credentials

SSRF is the OWASP #10 that most mid-level engineers underestimate until they see a breach enabled by it. The pattern: your application accepts a URL from a user and fetches it server-side — for a screenshot service, a webhook validator, a URL preview feature. An attacker submits http://169.254.169.254/latest/meta-data/iam/security-credentials/my-ec2-role. The EC2 instance's metadata service (IMDS) responds with the instance's IAM role credentials — AccessKeyId, SecretAccessKey, Token — valid for up to 12 hours. The attacker now has full access to everything that role can do.

The Capital One breach (2019) was SSRF + overpermissioned IAM role. The WAF they used was misconfigured in a way that allowed SSRF. The attacker used SSRF to reach IMDS, obtained IAM credentials, and used those credentials (which had s3:GetObject on *) to exfiltrate over 100 million customer records from S3.

Mitigations: (1) IMDSv2 — AWS Instance Metadata Service v2 requires a PUT request to obtain a session token before GET requests are accepted. The PUT request includes a TTL header and is not followable by a redirect. This breaks the basic SSRF attack because http://169.254.169.254 GET requests are now rejected without the session token. Enforce IMDSv2 in your launch templates. (2) URL allowlisting — if your service fetches user-provided URLs, define an explicit allowlist of permitted hostnames/IP ranges and reject everything else. Block RFC 1918 (private) and link-local (169.254.x.x) ranges. (3) Egress filtering — network-level controls (Security Groups, VPC Network ACLs, or an egress proxy) that block direct access to the metadata endpoint range from application workloads.

⚠ WARNING

Common Security Failure Modes in Production

Secrets in logs: structured logging that serializes request objects can include Authorization headers, API keys in query parameters, or database credentials in error messages. Fix: implement a log scrubbing middleware that redacts known sensitive field names (authorization, password, api_key, token, secret) before they reach your log sink. Test this explicitly.

Verbose error messages in production: stack traces, SQL queries, and internal paths in API error responses give attackers a map of your system. Fix: return generic error IDs in API responses; log detailed errors internally only.

Missing security headers: HTTP APIs that don't set Strict-Transport-Security, X-Content-Type-Options, X-Frame-Options, and Content-Security-Policy are vulnerable to clickjacking, MIME sniffing, and downgrade attacks. Fix: add these headers at the load balancer or API gateway level so every service gets them automatically.

Default credentials on infrastructure: Elasticsearch with no auth, Redis with no auth, Kubernetes dashboard with default credentials. Every time you deploy a new data store, the first question is: what is the auth mechanism and what are the default credentials? Change defaults before the instance is reachable from the network.

Over-broad CORS: Access-Control-Allow-Origin: * on an authenticated API is a cross-origin credential theft risk. Fix: explicitly allowlist known client origins; never use * if the endpoint reads or writes user-specific data.

TIP

Interview Playbook: How to Talk About Security Under Pressure

When asked a security question in a system design or engineering interview, structure your answer in three layers: what the threat is, why the naive defense fails, what actually works and what the tradeoff is.

For any new service design question, proactively cover: (1) where secrets live and how they rotate; (2) what IAM role this service runs as and what it can access; (3) what happens if this service is compromised — what is the blast radius? (4) how you verify the container image running in prod is the one built from your source commit.

Interviewers at FAANG are testing whether security is an afterthought you bolt on or a lens you design through. The signal that separates senior from staff: staff engineers identify the threat model before proposing a solution, not after. Say "before I propose a solution, let me think about what the threat model is here" — and then reason through attacker motivations, entry points, and what controls limit the blast radius.

The single most impressive thing you can say in a security interview is a concrete failure mode from production: "At my last company, we discovered a service was logging API keys in plaintext because our JSON serializer was recursively serializing the request headers object. We fixed it by building a scrubbing middleware and adding a CI check that runs gitleaks on the log output from integration tests." Specificity wins.

Interview Questions

Click to reveal answers

Test your knowledge

Sign in to take the Quiz

This topic has 15 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.