JWKS and Zero-Downtime Key Rotation
Scaling Trust
This article is part of a series on understanding the hows and whys of JSON Web Signatures (JWS).
There's accompanying code: it's refered to and linked throughout the content. But if you'd rather just read raw code, head over here.
Scaling Trust – JWKS and Zero-Downtime Key Rotation
You've chosen ES256. You've implemented signing and verification. Your scheme API is humming along in production, processing thousands of transactions daily. Then you get the call: a former employee who had access to your key management systems left on bad terms. Security policy says you must rotate keys immediately.
You reach for your runbook and your heart sinks. The key rotation procedure says:
- Generate new key pair
- Deploy new public key to 50+ partner banks
- Each bank will require contact during business hours, and possibly a deployment during maintenance windows
- During rotation, must keep the compromised key active
This will take forever. While a potentially compromised key remains live in production. While you're processing real money. This is bad. Real bad.
This pattern repeats across production systems: hard-coded public keys in verification code. "Public keys are public," teams reason. Then key compromise happens. Emergency rotation requires deploying new public keys to hundreds of microservices, each requiring code changes, reviews, and deployments. Key rotation takes 6-12 hours. During that window, compromised keys remain active.
JWKS (JSON Web Key Set) eliminates this entire class of problem by turning that unending emergency into a 5-minute configuration change.
What is JWKS?
JWKS (JSON Web Key Set) is a standardized JSON format for publishing cryptographic public keys. Instead of you and your partners hard-coding keys in their application, you publish them at a URL such as https://auth.yourcompany.com/.well-known/jwks.json, and clients fetch them as needed.
{
"keys": [
{
"kty": "EC",
"use": "sig",
"kid": "2024-11-prod",
"alg": "ES256",
"crv": "P-256",
"x": "WKn-ZIGevcwGIyyrzFoZNBdaq9_TsqzGl96oc0CWuis",
"y": "y77t-RvAHRKTsSGdIYUfweuOvwrvDD-Q3Hv5J0fSKbE"
},
{
"kty": "EC",
"use": "sig",
"kid": "2024-12-prod",
"alg": "ES256",
"crv": "P-256",
"x": "f83OJ3D2xF1Bg8vub9tLe1gHMzV76e8Tus9uPHvRVEU",
"y": "x_FEzRu9m36HLN_tue659LNpXW6pCyStikYjKIWI5a0"
}
]
}
Your JWT header references a specific key by ID:
{
"alg": "ES256",
"typ": "JWT",
"kid": "2024-11-prod"
}
When verifying, clients:
- Look at the
kidin the JWT header - Fetch the JWKS from the well-known endpoint
- Find the matching key in the
keysarray - Use that public key to verify the signature
This simple pattern solves multiple problems at once: key distribution, key rotation, multi-tenant key management, and emergency response.
The Problem JWKS Solves
Let's understand why JWKS exists by looking at the alternatives.
Anti-pattern 1: Hard-Coded Keys
# DON'T DO THIS
defmodule HardCodedVerifier do
@public_key """
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEWKn+ZIGevcwGIyyrzFoZNBdaq9/T
sqzGl96oc0CWuisy77t+RvAHRKTsSGdIYUfweuOvwrvDD+Q3Hv5J0fSKbE==
-----END PUBLIC KEY-----
"""
def verify_token(jws_string) do
jwk = JOSE.JWK.from_pem(@public_key)
JOSE.JWS.verify_strict(jwk, ["ES256"], jws_string)
end
end
Problems with this approach:
- Key rotation requires code changes and redeployment
- No way to support multiple active keys simultaneously
- Emergency rotations cause downtime
- Different environments need different code or complex configuration
Anti-pattern 2: Keys in Configuration Files
# config.yaml - DON'T DO THIS
jwt:
public_key: |
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEWKn+ZIGe...
-----END PUBLIC KEY-----
Problems:
- Better than hard-coding but still requires deployment to update
- Configuration drift across services
- No built-in versioning or key identification
- Secrets management tools help but don't solve the fundamental problem
The JWKS Solution
Implementation approach:
- Extract
kidfrom JWT header - Fetch JWKS from well-known endpoint
- Find matching key by
kid - Cache the result (10-15 minute TTL)
- Verify signature with the retrieved key
Benefits of this approach:
- Key rotation without code changes
- Multiple active keys simultaneously
- Automatic key discovery and caching
- Centralized key management
- Gradual rollout during rotation
- Emergency rotation without downtime
JWK Format: Anatomy of a Key
Let's decode what's in a JWKS. Each key in the keys array is a JWK (JSON Web Key). Here's an ES256 public key broken down:
{
"kty": "EC", // Key Type: Elliptic Curve
"use": "sig", // Public key use: Signature
"kid": "2024-11-prod", // Key ID: Unique identifier
"alg": "ES256", // Algorithm: ECDSA with P-256 and SHA-256
"crv": "P-256", // Curve: P-256 (secp256r1)
"x": "WKn-ZIGe...", // X coordinate (base64url encoded)
"y": "y77t-RvAH..." // Y coordinate (base64url encoded)
}
Different key types have different fields:
RSA Key (RS256/PS256):
{
"kty": "RSA",
"use": "sig",
"kid": "rsa-2024-11",
"alg": "PS256",
"n": "0vx7agoebG...", // Modulus
"e": "AQAB" // Exponent (usually 65537)
}
EdDSA Key (Ed25519):
{
"kty": "OKP", // Octet string Key Pairs
"use": "sig",
"kid": "ed25519-2024-11",
"alg": "EdDSA",
"crv": "Ed25519", // Curve determines hash
"x": "11qYAYKxCr..." // Public key
}
The Key ID (kid): Why It's Critical
The kid (Key ID) is the linchpin of the entire system. It's how JWT headers reference specific keys:
// JWT Header
{
"alg": "ES256",
"kid": "2024-11-prod",
"typ": "JWT"
}
Rules for good kid values:
- Make them unique: Never reuse a
kid, even after deleting the old key - Make them meaningful:
"2024-11-prod"tells you when/where,"abc123"doesn't - Make them sortable: Use dates/timestamps so you can identify newest keys
- Don't embed secrets: The
kidappears in JWT headers, which are unencrypted
Good patterns:
"2024-11-prod"- Date-based"prod-20241115-001"- Environment and timestamp"v2-signing-key"- Versioned with purpose
Bad patterns:
"1"- Not descriptive"prod"- Not unique over time"key_a7f2e1d9c4b3"- Random but not meaningful
The Standard Location For JKWS: .well-known/jwks.json
By convention, JWKS endpoints are published at a predictable location:
https://{your-domain}/.well-known/jwks.json
This convention comes from RFC 8414 (OAuth 2.0 Authorization Server Metadata) and OpenID Connect Discovery. The .well-known path is reserved for service discovery endpoints.
This gives you:
- Discoverability: Clients know where to look without configuration
- Security: The domain is verified via TLS, establishing trust
- Standardization: Works across identity providers
Examples from the wild:
- Auth0:
https://{tenant}.auth0.com/.well-known/jwks.json - GitHub:
https://token.actions.githubusercontent.com/.well-known/jwks
Follow this pattern unless you have a compelling reason not to.
How JWKS Enables Zero-Downtime Key Rotation
Here's where JWKS really shines. Let's walk through the four-phase rotation process:
Phase 1: Normal Operation (Single Key)
{
"keys": [
{
"kid": "key-2025-12",
"kty": "EC",
"use": "sig",
"alg": "ES256",
"crv": "P-256",
"x": "...",
"y": "..."
}
]
}
- Issuer signs with
key-2025-12 - Verifiers fetch JWKS, cache it
- Everything works
Phase 2: Introduce New Key (Both Keys Published)
{
"keys": [
{
// Old key (still active)
"kid": "key-2025-12",
"kty": "EC",
"use": "sig",
"alg": "ES256",
"crv": "P-256",
"x": "...",
"y": "..."
},
{
// New key (published but not yet used)
"kid": "key-2026-04",
"kty": "EC",
"use": "sig",
"alg": "ES256",
"crv": "P-256",
"x": "...",
"y": "..."
}
]
}
- Issuer still signs with
key-2025-12 - JWKS now includes both keys
- Verifiers' caches pick up the new key on next refresh
- Wait for cache TTL to ensure all verifiers have the new JWKS (typically 10-15 minutes)
Phase 3: Switch to New Key (Both Still Published)
{
"keys": [
{
// Old key (no longer used but still published)
"kid": "key-2025-12",
"kty": "EC",
"use": "sig",
"alg": "ES256",
"crv": "P-256",
"x": "...",
"y": "..."
},
{
// New key (now actively signing)
"kid": "key-2026-04",
"kty": "EC",
"use": "sig",
"alg": "ES256",
"crv": "P-256",
"x": "...",
"y": "..."
}
]
}
- Issuer switches to signing with
key-2026-04 - Old tokens with
kid: "key-2025-12"still verify (old key still in JWKS) - New tokens use
kid: "key-2026-04" - Wait for all old tokens to expire (based on your
exptimes in the JWS headers)
Phase 4: Remove Old Key (Rotation Complete)
{
"keys": [
{
"kid": "key-2026-04", // Only the new key remains
"kty": "EC",
"use": "sig",
"alg": "ES256",
"crv": "P-256",
"x": "...",
"y": "..."
}
]
}
- Old key removed from JWKS
- Any remaining tokens signed with old key will now fail verification
- Rotation complete
Total downtime: Zero. The overlap period ensures smooth transition.
Here's the flow visualized:
In the accompanying code, we've got support for multiple keys on the server side (where we are serving the JWKS endpoint to our partners so they can verify our signatures) as well as refresh during rotation on the client side (when we are retrieving partner keys to verify the signatures on messages they send us).
JWKS Caching: The Performance Critical Detail
Fetching JWKS on every token verification would be slow and could DDoS your endpoint. Caching is essential.
Caching strategy considerations:
Cache TTL (Time To Live):
- Too short (less than 1 minute): Excessive JWKS fetches, poor performance
- Too long (over 1 hour): Slow key rotation, delayed revocation
- Sweet spot: 10-15 minutes for most applications
Cache invalidation strategies:
- Time-based: Cache expires after TTL
- Error-based: If verification fails with cached key, try fetching fresh JWKS
- Manual: Force refresh on deployment or after key rotation
Cache warming:
- Fetch JWKS on application startup
- Don't wait for first verification to populate cache
- Prevents cold-start latency
Note on production caching: Basic TTL-based caching works for single-partner scenarios and development environments. For production systems handling multiple partners and requiring high availability, you'll need more sophisticated strategies.
In code:
Implementation: Publishing JWKS
Key implementation points:
- Endpoint location:
/.well-known/jwks.json - Response format: JSON with
{"keys": [...]}array - Key metadata: Include
kid,use: "sig",alg, plus key material (x/yfor EC,n/efor RSA) - HTTP headers: Set TTL, e.g. 10 minutes with
Cache-Control: public, max-age=600, must-revalidate - Extract public key: From your private key (never expose private keys)
- Support multiple keys: Array allows publishing old + new during rotation
Critical security requirements:
- Only export public keys, never private keys
- Set appropriate
Cache-Controlheaders - Serve over HTTPS only (no exceptions)
- Rate limit the endpoint (see next post for production patterns)
- Monitor for unusual access patterns
Implementation: Verifying with JWKS
Verification flow:
- Extract
kidfrom JWT header usingJOSE.JWS.peek_protected/1(code) - Fetch key from JWKS client (which handles caching), see code
- Verify signature using
JOSE.JWS.verify_strictwith allowed algorithms (code) - Validate claims including
exp,nbf,iatwith clock skew tolerance (5 minutes) - Log verification failures for monitoring and alerting
Key considerations:
- Whitelist allowed algorithms (e.g.,
["ES256"]only) - Handle missing
kidgracefully (reject the token) - Allow 5-minute clock skew for time-based claims
- Return detailed error reasons for debugging
- Track metrics on verification success/failure rates
JWKS vs Static Keys: When to Use Which
Use JWKS when:
- Multiple services verify tokens
- Need to support key rotation
- Building OAuth/OIDC provider
- Third parties need to verify your tokens
- Compliance requirements for key rotation
Static keys might be okay when:
- Single application signs and verifies (but why not use HMAC then?)
- Building proof-of-concept (though you could argue to set up JWKS anyway as the overhead is minimal)
- Very constrained environments with no network access (embedded systems, offline verification)
The pragmatic default: Use JWKS from day one. Teams that skip JWKS to "keep it simple" consistently face expensive retrofits when key rotation becomes urgent. The initial implementation cost is small compared to emergency rotation overhead later.
Organizational consideration: JWKS requires coordination between teams that sign tokens (Platform/Auth) and teams that verify tokens (all service owners). This organizational complexity is why some teams resist iti, but it's also precisely why it's valuable. JWKS forces you to solve key distribution correctly before it becomes an emergency.
Wrapping Up
JWKS turns key rotation from an 8-hour nightmare into a 5-minute configuration change. It's the bridge between your JWS security theory and operational reality.
Key takeaways:
- JWKS is a standard format for publishing public keys at
.well-known/jwks.json - The
kidis critical for referencing specific keys during verification - Caching is essential for performance (10-15 minutes is the sweet spot)
- Zero-downtime rotation is possible through four-phase overlapping key publication
- Start with JWKS from day one - retrofitting is more expensive than implementing correctly upfront
What's next:
This post covered the fundamentals of JWKS - what it is, why you need it, and how to implement basic key rotation. But production systems face additional challenges:
- Multi-tenancy: Managing JWKS for 200+ partner banks, each with their own endpoints and rotation schedules
- High availability: What happens when a partner's JWKS endpoint goes down? Do you reject all their legitimate transactions?
- Security at scale: How do you protect against compound attacks (key compromise + endpoint DoS)?
- Operational maturity: Monitoring, alerting, incident response procedures for JWKS in financial services
For now, you have the foundation: JWKS solves key distribution, enables zero-downtime rotation, and turns emergencies into routine maintenance. Let's move on to production patterns.
This article is part of a series on understanding the hows and whys of JSON Web Signatures (JWS).
There's accompanying code: it's refered to and linked throughout the content. But if you'd rather just read raw code, head over here.