Each change needs a documented approval trail. While you can get pre-approval for automated rotations as a class of changes, many auditors interpret the controls conservatively and want to see individual change tickets for each cert rotation, even routine ones.
I'm not sure why many people are still dealing with legacy manual certificate renewal. Maybe some regulatory requirements? I even have a wildcard cert that covers my entire local network which is generated and deployed automatically by a cron job I wrote about 5 years ago. It's working perfectly and it would probably take me longer to track down exactly what it's doing than to re-write it from scratch.
For 99.something% of use cases, this is a solved problem.
- Rotation of all certificates and authentication material must be renewed at regular intervals (no conflict here, this is the goal)
- All infrastructure changes need to have the commands executed and contents of files inspected and approved in writing by the change control board before being applied to the environment
That explicit approval of any changes being made within the environment go against these being automated in any way shape or form. These boards usually meet monthly or ad-hoc for time-sensitive security updates and usually have very long lists of changes to review causing the agenda to constantly overflow to the next meeting.
You could probably still make it work as a priority standing agenda idea but its going to still involve manual process and review every month. I wouldn't want to manually rotate and approve certificates every month and many of these requirements have been signed into law (at least in the US).
Starting to see another round of modernization initiatives so maybe in the next few years something could be done...
I never understood why.
1. Generate your initial refresh token for the user just like you would a random API key. You really don't need to use a JWT, but you could.
2. The client sends the refresh token to an authentication endpoint. This endpoint validates the token, expires the refresh token and any prior bearer tokens issued to it. The client gets back a new refresh token and a bearer token with an expiration window (lets call it five minutes).
3. The client uses the bearer token for all requests to your API until it expires
4. If the client wants to continue using the API, go back to step 2.
The benefits of that minimal version:
Client restriction and user behavior steering. With the bearer tokens expiring quickly, and refresh tokens being one-time use it is infeasible to share a single credential between multiple clients. With easy provisioning, this will get users to generate one credential per client.
Breach containment and blast radius reduction. If your bearer tokens leak (logs being a surprisingly high source for these), they automatically expire when left in backups or deep in the objects of your git repo. If a bearer token is compromised, it's only valid for your expiration window. If a refresh token is compromised and used, the legitimate client will be knocked offline increasing the likelihood of detection. This property also allows you to know if a leaked refresh token was used at all before it was revoked.
Audit and monitoring opportunities. Every refresh creates a logging checkpoint where you can track usage patterns, detect anomalies, and enforce policy changes. This gives you natural rate limiting and abuse detection points.
Most security frameworks (SOC 2, ISO 27001, etc.) prefer time-limited credentials as a basic security control.
Add an expiration time to refresh tokens to naturally clean up access from broken or no longer used clients. Example: Daily backup script. Refresh token's expiration window is 90 days. The backups would have to not run for 90 days before the token was an issue. If it was still needed the effort is low, just provision a new API key. After 90 days of failure you either already needed to perform maintenance on your backup system or you moved to something else without revoking the access keys.
If a client is accessing an API on behalf of itself (which is a more natural fit for an API Key replacement) then we can use client_credentials with either client secret authentication or JWT bearer authentication instead.
There doesn't need to be any OIDC or third party involved to get all the benefits of them. The keys can't be used by multiple simultaneous clients, they naturally expire and rotate over time, and you can easily audit their use (primarily due to the last two principles).
Sigh... I wish this were not true. It's a shame that no alternatives have emerged so far.
When someone does it for the audience I always consider it more of a publication. Maybe that just semantics, but that's been the distinction for me.
I still see high traffic on a post explaining oddities in some of Route53's unintuitive behaviors and hope I'm making someone's day a little better in giving them a solution.
That drives me to write more.