JWKS and Zero-Downtime Key Rotation

Scaling Trust

Posted on January 16, 2026

This article is part of a series on understanding the hows and whys of JSON Web Signatures (JWS).

There's accompanying code: it's refered to and linked throughout the content. But if you'd rather just read raw code, head over here.

Previous in series Start of series Next in series

Scaling Trust – JWKS and Zero-Downtime Key Rotation

You've chosen ES256. You've implemented signing and verification. Your scheme API is humming along in production, processing thousands of transactions daily. Then you get the call: a former employee who had access to your key management systems left on bad terms. Security policy says you must rotate keys immediately.

You reach for your runbook and your heart sinks. The key rotation procedure says:

Generate new key pair
Deploy new public key to 50+ partner banks
Each bank will require contact during business hours, and possibly a deployment during maintenance windows
During rotation, must keep the compromised key active

This will take forever. While a potentially compromised key remains live in production. While you're processing real money. This is bad. Real bad.

This pattern repeats across production systems: hard-coded public keys in verification code. "Public keys are public," teams reason. Then key compromise happens. Emergency rotation requires deploying new public keys to hundreds of microservices, each requiring code changes, reviews, and deployments. Key rotation takes 6-12 hours. During that window, compromised keys remain active.

JWKS (JSON Web Key Set) eliminates this entire class of problem by turning that unending emergency into a 5-minute configuration change.

What is JWKS?

JWKS (JSON Web Key Set) is a standardized JSON format for publishing cryptographic public keys. Instead of you and your partners hard-coding keys in their application, you publish them at a URL such as https://auth.yourcompany.com/.well-known/jwks.json, and clients fetch them as needed.

  
  
    
{
  "keys": [
    {
      "kty": "EC",
      "use": "sig",
      "kid": "2024-11-prod",
      "alg": "ES256",
      "crv": "P-256",
      "x": "WKn-ZIGevcwGIyyrzFoZNBdaq9_TsqzGl96oc0CWuis",
      "y": "y77t-RvAHRKTsSGdIYUfweuOvwrvDD-Q3Hv5J0fSKbE"
    },
    {
      "kty": "EC",
      "use": "sig",
      "kid": "2024-12-prod",
      "alg": "ES256",
      "crv": "P-256",
      "x": "f83OJ3D2xF1Bg8vub9tLe1gHMzV76e8Tus9uPHvRVEU",
      "y": "x_FEzRu9m36HLN_tue659LNpXW6pCyStikYjKIWI5a0"
    }
  ]
}

Your JWT header references a specific key by ID:

  
  
    
{
  "alg": "ES256",
  "typ": "JWT",
  "kid": "2024-11-prod"
}

When verifying, clients:

Look at the kid in the JWT header
Fetch the JWKS from the well-known endpoint
Find the matching key in the keys array
Use that public key to verify the signature

This simple pattern solves multiple problems at once: key distribution, key rotation, multi-tenant key management, and emergency response.

The Problem JWKS Solves

Let's understand why JWKS exists by looking at the alternatives.

Anti-pattern 1: Hard-Coded Keys

  
  
    
# DON'T DO THIS
defmodule HardCodedVerifier do
  @public_key """
  -----BEGIN PUBLIC KEY-----
  MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEWKn+ZIGevcwGIyyrzFoZNBdaq9/T
  sqzGl96oc0CWuisy77t+RvAHRKTsSGdIYUfweuOvwrvDD+Q3Hv5J0fSKbE==
  -----END PUBLIC KEY-----
  """

  def verify_token(jws_string) do
    jwk = JOSE.JWK.from_pem(@public_key)
    JOSE.JWS.verify_strict(jwk, ["ES256"], jws_string)
  end
end

Problems with this approach:

Key rotation requires code changes and redeployment
No way to support multiple active keys simultaneously
Emergency rotations cause downtime
Different environments need different code or complex configuration

Anti-pattern 2: Keys in Configuration Files

  
  
    
# config.yaml - DON'T DO THIS
jwt:
  public_key: |
    -----BEGIN PUBLIC KEY-----
    MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEWKn+ZIGe...
    -----END PUBLIC KEY-----

Problems:

Better than hard-coding but still requires deployment to update
Configuration drift across services
No built-in versioning or key identification
Secrets management tools help but don't solve the fundamental problem

The JWKS Solution

Implementation approach:

Extract kid from JWT header
Fetch JWKS from well-known endpoint
Find matching key by kid
Cache the result (10-15 minute TTL)
Verify signature with the retrieved key

Benefits of this approach:

Key rotation without code changes
Multiple active keys simultaneously
Automatic key discovery and caching
Centralized key management
Gradual rollout during rotation
Emergency rotation without downtime

JWK Format: Anatomy of a Key

Let's decode what's in a JWKS. Each key in the keys array is a JWK (JSON Web Key). Here's an ES256 public key broken down:

  
  
    
{
  "kty": "EC",           // Key Type: Elliptic Curve
  "use": "sig",          // Public key use: Signature
  "kid": "2024-11-prod", // Key ID: Unique identifier
  "alg": "ES256",        // Algorithm: ECDSA with P-256 and SHA-256
  "crv": "P-256",        // Curve: P-256 (secp256r1)
  "x": "WKn-ZIGe...",    // X coordinate (base64url encoded)
  "y": "y77t-RvAH..."    // Y coordinate (base64url encoded)
}

Different key types have different fields:

RSA Key (RS256/PS256):

  
  
    
{
  "kty": "RSA",
  "use": "sig",
  "kid": "rsa-2024-11",
  "alg": "PS256",
  "n": "0vx7agoebG...",  // Modulus
  "e": "AQAB"            // Exponent (usually 65537)
}

EdDSA Key (Ed25519):

  
  
    
{
  "kty": "OKP",           // Octet string Key Pairs
  "use": "sig",
  "kid": "ed25519-2024-11",
  "alg": "EdDSA",
  "crv": "Ed25519",       // Curve determines hash
  "x": "11qYAYKxCr..."   // Public key
}

The Key ID (kid): Why It's Critical

The kid (Key ID) is the linchpin of the entire system. It's how JWT headers reference specific keys:

  
  
    
// JWT Header
{
  "alg": "ES256",
  "kid": "2024-11-prod",
  "typ": "JWT"
}

Rules for good kid values:

Make them unique: Never reuse a kid, even after deleting the old key
Make them meaningful: "2024-11-prod" tells you when/where, "abc123" doesn't
Make them sortable: Use dates/timestamps so you can identify newest keys
Don't embed secrets: The kid appears in JWT headers, which are unencrypted

Good patterns:

"2024-11-prod" - Date-based
"prod-20241115-001" - Environment and timestamp
"v2-signing-key" - Versioned with purpose

Bad patterns:

"1" - Not descriptive
"prod" - Not unique over time
"key_a7f2e1d9c4b3" - Random but not meaningful

The Standard Location For JKWS: .well-known/jwks.json

By convention, JWKS endpoints are published at a predictable location:

https://{your-domain}/.well-known/jwks.json

This convention comes from RFC 8414 (OAuth 2.0 Authorization Server Metadata) and OpenID Connect Discovery. The .well-known path is reserved for service discovery endpoints.

This gives you:

Discoverability: Clients know where to look without configuration
Security: The domain is verified via TLS, establishing trust
Standardization: Works across identity providers

Examples from the wild:

Auth0: https://{tenant}.auth0.com/.well-known/jwks.json
GitHub: https://token.actions.githubusercontent.com/.well-known/jwks

Follow this pattern unless you have a compelling reason not to.

How JWKS Enables Zero-Downtime Key Rotation

Here's where JWKS really shines. Let's walk through the four-phase rotation process:

Phase 1: Normal Operation (Single Key)

  
  
    
{
  "keys": [
    {
      "kid": "key-2025-12",
      "kty": "EC",
      "use": "sig",
      "alg": "ES256",
      "crv": "P-256",
      "x": "...",
      "y": "..."
    }
  ]
}

Issuer signs with key-2025-12
Verifiers fetch JWKS, cache it
Everything works

Phase 2: Introduce New Key (Both Keys Published)

  
  
    
{
  "keys": [
    {
      // Old key (still active)
      "kid": "key-2025-12",
      "kty": "EC",
      "use": "sig",
      "alg": "ES256",
      "crv": "P-256",
      "x": "...",
      "y": "..."
    },
    {
      // New key (published but not yet used)
      "kid": "key-2026-04",
      "kty": "EC",
      "use": "sig",
      "alg": "ES256",
      "crv": "P-256",
      "x": "...",
      "y": "..."
    }
  ]
}

Issuer still signs with key-2025-12
JWKS now includes both keys
Verifiers' caches pick up the new key on next refresh
Wait for cache TTL to ensure all verifiers have the new JWKS (typically 10-15 minutes)

Phase 3: Switch to New Key (Both Still Published)

  
  
    
{
  "keys": [
    {
      // Old key (no longer used but still published)
      "kid": "key-2025-12",
      "kty": "EC",
      "use": "sig",
      "alg": "ES256",
      "crv": "P-256",
      "x": "...",
      "y": "..."
    },
    {
      // New key (now actively signing)
      "kid": "key-2026-04",
      "kty": "EC",
      "use": "sig",
      "alg": "ES256",
      "crv": "P-256",
      "x": "...",
      "y": "..."
    }
  ]
}

Issuer switches to signing with key-2026-04
Old tokens with kid: "key-2025-12" still verify (old key still in JWKS)
New tokens use kid: "key-2026-04"
Wait for all old tokens to expire (based on your exp times in the JWS headers)

Phase 4: Remove Old Key (Rotation Complete)

  
  
    
{
  "keys": [
    {
      "kid": "key-2026-04",  // Only the new key remains
      "kty": "EC",
      "use": "sig",
      "alg": "ES256",
      "crv": "P-256",
      "x": "...",
      "y": "..."
    }
  ]
}

Old key removed from JWKS
Any remaining tokens signed with old key will now fail verification
Rotation complete

Total downtime: Zero. The overlap period ensures smooth transition.

Here's the flow visualized:

In the accompanying code, we've got support for multiple keys on the server side (where we are serving the JWKS endpoint to our partners so they can verify our signatures) as well as refresh during rotation on the client side (when we are retrieving partner keys to verify the signatures on messages they send us).

JWKS Caching: The Performance Critical Detail

Fetching JWKS on every token verification would be slow and could DDoS your endpoint. Caching is essential.

Caching strategy considerations:

Cache TTL (Time To Live):

Too short (less than 1 minute): Excessive JWKS fetches, poor performance
Too long (over 1 hour): Slow key rotation, delayed revocation
Sweet spot: 10-15 minutes for most applications

Cache invalidation strategies:

Time-based: Cache expires after TTL
Error-based: If verification fails with cached key, try fetching fresh JWKS
Manual: Force refresh on deployment or after key rotation

Cache warming:

Fetch JWKS on application startup
Don't wait for first verification to populate cache
Prevents cold-start latency

Note on production caching: Basic TTL-based caching works for single-partner scenarios and development environments. For production systems handling multiple partners and requiring high availability, you'll need more sophisticated strategies.

In code:

Implementation: Publishing JWKS

Key implementation points:

Endpoint location: /.well-known/jwks.json
Response format: JSON with {"keys": [...]} array
Key metadata: Include kid , use: "sig" , alg , plus key material (x/y for EC, n/e for RSA)
HTTP headers: Set TTL, e.g. 10 minutes with Cache-Control: public, max-age=600, must-revalidate
Extract public key: From your private key (never expose private keys)
Support multiple keys: Array allows publishing old + new during rotation

Critical security requirements:

Only export public keys, never private keys
Set appropriate Cache-Control headers
Serve over HTTPS only (no exceptions)
Rate limit the endpoint (see next post for production patterns)
Monitor for unusual access patterns

Implementation: Verifying with JWKS

Verification flow:

Extract kid from JWT header using JOSE.JWS.peek_protected/1 (code)
Fetch key from JWKS client (which handles caching), see code
Verify signature using JOSE.JWS.verify_strict with allowed algorithms (code)
Validate claims including exp, nbf, iat with clock skew tolerance (5 minutes)
Log verification failures for monitoring and alerting

Key considerations:

Whitelist allowed algorithms (e.g., ["ES256"] only)
Handle missing kid gracefully (reject the token)
Allow 5-minute clock skew for time-based claims
Return detailed error reasons for debugging
Track metrics on verification success/failure rates

JWKS vs Static Keys: When to Use Which

Use JWKS when:

Multiple services verify tokens
Need to support key rotation
Building OAuth/OIDC provider
Third parties need to verify your tokens
Compliance requirements for key rotation

Static keys might be okay when:

Single application signs and verifies (but why not use HMAC then?)
Building proof-of-concept (though you could argue to set up JWKS anyway as the overhead is minimal)
Very constrained environments with no network access (embedded systems, offline verification)

The pragmatic default: Use JWKS from day one. Teams that skip JWKS to "keep it simple" consistently face expensive retrofits when key rotation becomes urgent. The initial implementation cost is small compared to emergency rotation overhead later.

Organizational consideration: JWKS requires coordination between teams that sign tokens (Platform/Auth) and teams that verify tokens (all service owners). This organizational complexity is why some teams resist iti, but it's also precisely why it's valuable. JWKS forces you to solve key distribution correctly before it becomes an emergency.

Wrapping Up

JWKS turns key rotation from an 8-hour nightmare into a 5-minute configuration change. It's the bridge between your JWS security theory and operational reality.

Key takeaways:

JWKS is a standard format for publishing public keys at .well-known/jwks.json
The kid is critical for referencing specific keys during verification
Caching is essential for performance (10-15 minutes is the sweet spot)
Zero-downtime rotation is possible through four-phase overlapping key publication
Start with JWKS from day one - retrofitting is more expensive than implementing correctly upfront

What's next:

This post covered the fundamentals of JWKS - what it is, why you need it, and how to implement basic key rotation. But production systems face additional challenges:

Multi-tenancy: Managing JWKS for 200+ partner banks, each with their own endpoints and rotation schedules
High availability: What happens when a partner's JWKS endpoint goes down? Do you reject all their legitimate transactions?
Security at scale: How do you protect against compound attacks (key compromise + endpoint DoS)?
Operational maturity: Monitoring, alerting, incident response procedures for JWKS in financial services

For now, you have the foundation: JWKS solves key distribution, enables zero-downtime rotation, and turns emergencies into routine maintenance. Let's move on to production patterns.

This article is part of a series on understanding the hows and whys of JSON Web Signatures (JWS).

There's accompanying code: it's refered to and linked throughout the content. But if you'd rather just read raw code, head over here.

Previous in series Start of series Next in series