Key rotation runbook¶
This runbook covers rotating a signing key in production. Run it when a key expires, is compromised, or a scheduled rotation policy triggers.
When to rotate¶
| Trigger | Urgency | Overlap window |
|---|---|---|
| Key compromise suspected | Immediate | None - revoke old key first |
| Scheduled expiry (90-day policy) | Planned | 24-hour overlap |
| Personnel change (key holder leaves) | Same day | 1-hour overlap |
| Hardware security key replacement | Planned | 24-hour overlap |
Pre-rotation checklist¶
Before starting:
- [ ] Identify all active manifests signed with the current key (query your manifest store by
signature.key_id) - [ ] Confirm the new key is generated in a secure environment (HSM or secrets manager - not on a developer workstation)
- [ ] Confirm CRL endpoint is reachable and writable
- [ ] Alert the on-call rotation so they expect a temporary spike in INVALID results during overlap
Step 1 - Generate a new key pair¶
# Generate a new Ed25519 key pair
manifest keygen --out /path/to/new-signing-key.b64url --print-pub
# The printed public key goes into your trust anchor configuration
# The private key stays in the secrets manager - never in source control
Or in Python:
from agent_manifest import generate_ed25519
new_kp = generate_ed25519()
print("New key_id:", new_kp.key_id)
print("Public key (base64url):", new_kp.public_b64url())
# Store new_kp.private_b64url() in your secrets manager
Step 2 - Re-sign active manifests¶
Re-issue every active manifest with the new key. The manifest_id and issued_at stay the same; only the signature block changes.
from agent_manifest._signing import Ed25519Signer, ed25519_from_private_bytes
import base64, json
# Load the new private key from the secrets manager
raw = base64.urlsafe_b64decode(new_private_key_b64url + "==")
new_kp = ed25519_from_private_bytes(raw)
signer = Ed25519Signer(new_kp)
for manifest in active_manifests:
# Remove the old signature so the pre-image is clean
manifest.pop("signature", None)
# Sign with the new key
new_sig = signer.sign(manifest)
manifest["signature"] = new_sig
# Write the updated manifest back to your store
manifest_store[manifest["manifest_id"]] = manifest
Step 3 - Revoke the old key¶
Issue a key-level revocation record for every manifest signed with the old key:
from agent_manifest._revocation import sign_revocation, FileCRL
from pathlib import Path
crl = FileCRL(Path("crl.jsonl"))
for manifest_id in manifests_signed_with_old_key:
record = sign_revocation(
manifest_id=manifest_id,
reason=f"Key rotation - old key_id={old_key_id}",
revoked_by="spiffe://trust.acme.co/security-team",
keypair=new_kp, # sign revocations with the NEW key
)
crl.revoke(record)
Step 4 - Update the CRL endpoint¶
The FileCRL append-only file is your CRL store. Publish it to your .well-known endpoint:
# If serving from a static file host (S3, GCS, Azure Blob):
aws s3 cp crl.jsonl s3://your-bucket/.well-known/agent-manifest/revocation \
--content-type application/x-ndjson \
--cache-control "max-age=30"
# If serving from the FastAPI CRL router, the file is already live -
# the router reads it on each request.
Step 5 - Notify verifiers¶
If your verifiers use a trust anchor discovery endpoint (/.well-known/agent-manifest/trust-anchor), update it with the new public key:
{
"active_key_id": "sha256:<new-key-id>",
"keys": [
{
"key_id": "sha256:<new-key-id>",
"algorithm": "Ed25519",
"public_key": "<new-public-key-base64url>",
"valid_from": "2026-06-05T10:00:00Z"
}
]
}
Verifiers that cache the trust anchor will pick up the new key at their next cache expiry (typically 5–60 minutes).
Step 6 - Decommission the old private key¶
After the overlap window has closed and all verifiers have accepted at least one manifest signed with the new key:
- Delete the old private key from the secrets manager
- Archive the old public key (it is still needed to verify historical manifests during the retention window)
- Record the rotation event in your audit log with timestamp and new key ID
Zero-downtime overlap¶
During the overlap window, both the old and new keys are active. Verifiers may encounter manifests signed by either key. The correct behaviour:
- Verifiers that know both keys accept both signatures
- Verifiers that only know the new key will reject old-signed manifests - deploy new manifests before updating verifiers
Recommended sequence for zero downtime:
t=0 Generate new key
t=5m Re-sign manifests, publish new trust anchor
t=30m Wait for all verifiers to pick up new trust anchor
t=35m Revoke old-key manifests, update CRL
t=1h Delete old private key
Rollback procedure¶
If the new key is defective (e.g., wrong algorithm, corrupted private key bytes):
- Do not revoke the old key - keep it active
- Re-sign manifests with the old key (same process as Step 2)
- Remove the new key from the trust anchor
- Investigate the new key generation before retrying
Monitoring during rotation¶
Watch for these signals during and after rotation:
verification_requests_total{result="INVALID"}spike → verifiers are seeing manifests signed with a key they don't recogniseverification_requests_total{result="REVOKED"}spike → CRL is propagating correctly- p99 verification latency spike → CRL endpoint is under load from the batch revocation
See Monitoring guide for dashboard configuration.