Staying Out of TTL Hell

Another common problem in caching is the "thundering herd". You have a bit of data that is expensive to compute, very frequently requested, and fine to be a little bit stale.

The naive approach is to follow the normal pattern: use the cached version of it's there, and otherwise compute and stick it in the cache with a TTL. The problem arises when you have a fleet of web nodes going at full bore, and the cache expires. If the data takes 10 seconds to compute, then for the whole next 10 seconds, every new request will find the data missing from the cache since it has expired and not yet been replaced, and so each of those requests will take on the expensive and redundant work of repopulating that data.

It can be a performance blip at best, or a total disaster at worst, causing cascading failures and downtime.

One relatively easy solution here is to just have a separate worker process that runs in an interval and preemptively freshens the data in the cache. In this case you can set with no TTL, and web front ends just use whatever they find. You also need to deal gracefully with the rare case where they may find nothing, like in the case of a cold cache boot.

revicon · 4 years ago

This runs afoul of the article’s warning to “ Never require a cache hit”

  It's important never to 
  require a cache hit - even
  as a soft requirement.
  Evictions can happen at
  inconvenient times - such
  as when some other part of
  the system is under load - 
  and there mustn't be any
  negative consequences to a
  cache miss

If the cache does empty unexpectedly and that is going to cause your infrastructure to die in a cascade, probably some kind of fallback will be needed.

stetrain · 4 years ago

If you have a result that takes 10 seconds to calculate, and you need to read it faster than that, so you make a background processes to calculate it and keep it updated somewhere, you could argue that it's no longer caching.

It's an asynchronous read model or a projection and it's now just part of your application.

Thaxll · 4 years ago

If the computation of the data takes 10sec and your hammered with request, you should def require a cache hit and return a 500 if there is no cache and do not go the DB or somebackup mechanism.

plett · 4 years ago

This is done in the DNS world too. The unbound[1] resolver can pre-fetch a new record half way through the TTL if enough clients have requested it.

That guarantees that the cache remains hot for records that clients are requesting, but lets unused records expire.

1: https://nlnetlabs.nl/projects/unbound/about/

hinkley · 4 years ago

As the TTL shrinks to a smaller multiple of the time needed to generate the response, this problem becomes magnified.

A two second response with a 10 second TTL could see 20% of the queries all trying to refresh the same cache at once. With a single server you can use promise caching to dedupe such things, with a cluster that's more difficult to avoid.

Background processes are good for spiky traffic, and with reddit and HN around some of us at least have been trained that some information on a page is purely advisory. Do I have 22 upvotes or 28? Give it a couple minutes and try again.

But what that work does is make you look at what the cost is of certain information, and some people are not comfortable with that. With caching you can delude yourself that you're getting a bargain on all of this information overload, even though the worst case scenarios and statistical clumping say you're off your gourd.

filleokus · 4 years ago

It can also quite easily be solved by using something like nginx's proxy_cache_background_update and/or proxy_cache_lock, depending on wether it's permissible to send stale responses.

Then you could have the cache serve stale content until it's updated, and/or only allow one upstream connection to refresh the cache.

But yeah, unless you allow stale responses or refresh the cache periodically as you suggest, you would still have requests which takes 10 second to complete, but you would at least not crush the upstream service.

toast0 · 4 years ago

Squid had a collapsed forwarding option that can help with the thundering herd. If multiple clients request the same uncached url, it can wait for the first origin request to finish and serve the response to all clients (if cachable). Of course, that backfires if the response was uncachable; so careful configuration is required.

Also works well to combine that with stale-while-revalidate. Yahoo had some squid patches with nifty ways to have a response invalidate other cached urls, but I don't think those made it to the outside world (at least I can't find them now).

gonzo41 · 4 years ago

Varnish has a keep time that allows you to hold a 'dead' object for ttl+X for exactly this reason. You server it for so many seconds whilst under load while a single origin request goes and grabs the data. It's a really solid way to shed load and avoid dropping so many connections at once and then getting slammed on retry.

hyperpape · 4 years ago

It's a pool, not a cache per se, but the Hikari database connection pool subtracts up to a random amount up to a few percent from each connection's lifetime to avoid a similar problem. If you have a 30 minute lifetime, you spread the expirations (and recreations) over half a minute or so.

jeffbee · 4 years ago

Another probabilistic approach is to have an increasing probability of an artificial cache miss as the TTL approaches zero. This forces some unlucky client to refresh it for everybody, avoiding the thundering herd.

We used that technique in Google's DNS cache to fix what had been a monumental herd effect every 30 seconds when the RRs for outlook-com.olc.protection.outlook.com expired.

nerdponx · 4 years ago

I once had this situation at an old job, and I ended up working around it by putting a lock around the re-computation: all requests had to wait for the 5-ish seconds while the cache was being refreshed.

In this case, per-request latency wasn't as important as total throughput of the system over time. And the individual requests were small, so building up a queue of unhandled requests wasn't a problem.

I have no idea if this is considered a "good" solution, but it worked well enough for me at the time.

thaumasiotes · 4 years ago

> Another common problem in caching is the "thundering herd".

> you have a fleet of web nodes going at full bore, and the cache expires. If the data takes 10 seconds to compute, then for the whole next 10 seconds, every new request will find the data missing from the cache since it has expired and not yet been replaced, and so each of those requests will take on the expensive and redundant work of repopulating that data.

Wait, is the problem that as soon as the cache entry expires everyone requests the new data all at once, overwhelming the backend ("thundering herd"), or is it that requests follow whatever the normal pattern is but the backend doesn't know how to queue them?

I'm probably misunderstanding something here, but it seems like this whole article ignores race conditions. For instance, if you 'update on write', how do guarantee that the cache writes don't occur in the wrong order?

e.g. write B write A update cache with A update cache with B

You're just replicated all the usual single machine multi-threading issues when you abandon the ACID happy place by adding a cache...

EDIT: and this is why I would use TTLs even with update on write, folks. And this is why I mostly happily use volatile-lru on multi-purpose redis caches, since everything I'd want to evict has a TTL.

calpaterson · 4 years ago

I don't disagree this is a problem but it is so general that it applies everywhere you use multiple backing services.

A classic is sending an email: do you send the notification email before or after committing to the database? If you send after you run the risk of committing to the database and then failing to send the notification email. If you send before you might send the email and then fail to commit.

I would love to have cross database/mailserver/cache/etc transactions but it's implemented nowhere as far as I can see, thought there seems to have been work towards it to the 90s I think they ultimately gave up.

wryun · 4 years ago

If you agree it's a problem, it'd be great if you'd at least add this onto the tips and pitfalls section. The main claim of the article is get rid of TTLs and "don't give up correctness to gain speed", which seems incorrect to me.

One of the main reasons that TTLs are good is exactly this; if you remove TTLs as you advise, strategy 2 and 3 can have permanently wrong data in the cache, right? This is ... not good?

EDIT: I don't think the email one is a good comparison, either. (a) you have to do this (it's not an optional bonus, as the cache is), and (b) it's perfectly reasonable for critical emails to record in the db whether it's successful or report it elsewhere - you can have multiple states!

If we're just talking notifications in general, I would prefer sending them after commit (i.e. don't tell people until we're sure the action has happened, and wear the possible not telling), though depending on your priorities you may switch it. But I think of this as an external action, NOT something where you're messing with your system's version of the truth.

oftenwrong · 4 years ago

A bunch of databases support XA transactions [1], or otherwise have generic support for 2PC that can be hooked up to an external transaction manager [2].

In theory, we could build support into most systems. It does seem like this idea has lost momentum, though. I am not sure why.

[1] https://en.wikipedia.org/wiki/X/Open_XA

[2] for example, https://www.postgresql.org/docs/13/sql-prepare-transaction.h...

namibj · 4 years ago

You write to the database, and have a separate notification-sender that feeds on the database change stream, sends out notifications, and keeps track of which notifications it has sent. If that process crashes, it will re-send one or maybe a few notifications.

from pyappcache import RedisCache @RedisCache def get_slow_thing_v3(thing_id): thing = get_slow_thing_from_a_database_layer() return thing def update_something_slow(thing_id, new_thing): get_slow_thing_v3.set_cache(thing_id, new_thing) set_thing_in_a_database_layer(new_thing)

montroser · 4 years ago

kjhughes · 4 years ago

TTL will always be Transistor Transistor Logic to me.

That binding has an infinite time to live in my mind.

moefh · 4 years ago

Same here. I clicked on the link fully expecting a discussion about the transition from TTL to CMOS, or something like that.

UncleOxidant · 4 years ago

Same here. I thought this was going to be how you should be using CMOS instead of TTL to avoid overheating.

linker3000 · 4 years ago

Likewise - I thought it might be about how much harder it's getting to source through-hole TTL logic for vintage computer repairs and new hobby builds.

I have just ordered 40 x 74F260 from Poland (I'm in the UK)...for less than some of the people on *bay are charging for 5.

Deleted Comment

aequitas · 4 years ago

> The simplest strategy is to just never invalidate. Some data doesn't go bad. The contents of CSV upload #42345 won't change. Neither will the HTML conversion of some markdown-formatted product data.

Thats a dangerous assumption to make and while simple at the cache's end might need complicated logic at the frontend. There will always be that use case of a customer that needs the file changed or removed. So instead of invalidating you now need a system to manage references to versioned cached objects.

borplk · 4 years ago

Yes. One danger of infrequently-invalidated or never-invalidated data is that it can turn into a ticking time bomb.

For example you may introduce a bug in the code path for calculating the fresh value that causes an error.

That code may not execute for months and months since the cache is warm and up and running. Until one day it is restarted and you find out the problems it was masking.

Pro tip: don't get greedy with cache TTL, define and enforce a low value that you can tolerate. In many use cases you can easily afford to re-compute once per 5 or 10 or 30 minutes.

WrtCdEvrydy · 4 years ago

We've given up on this whole idea and basically gone for extremely long TTLs (30 days) on our caching layer. We then built the tooling to dump caches after deploys and manually on request. You can dump caches at the application layer, or even down to the specific request param combination layer. It's worth investing a bit into systems.

ectopod · 4 years ago

If every file upload creates a new upload number (which is a common design) and the upload numbers are not exposed to the client then changes and deletions work without any special effort. I guess this is the case the author is talking about.

jrochkind1 · 4 years ago

In many cases, such as the CSV one, I think the answer is converting the article's Strategy 1 to the article's Strategy 4 -- cache under a key that has a digest hash of the content in it, so when the content changes, the cache key changes.

ben509 · 4 years ago

If the requestor only knows the filename, now you have to cache the filename to hash lookup.

Then you've just moved the invalidation problem around.

This is a great strategy if the object is controlled by a build process, so your client code already knows what the hash is.

tpetry · 4 years ago

You can just overwrite the cache with new information when the user replaces a file.

You can't, because you design to never invalidate, there is no way to ensure if and when this change is propagated through your entire cache layer (and local caches like browsers). And thus guarantee to the client the change has been made.

I'm not saying it's a bad idea. But often a seemingly simple solution to a complex problem doesn't take the entire complexity of the problem into account. Which is fine if you can consciously make that tradeoff but more often than not will turn into a footgun eventually.

_AzMoo · 4 years ago

That's update on write, not never invalidate.

MaxBarraclough · 4 years ago

I was thinking about how this relates to write-through and write-back in hardware caches, there's an interesting inversion.

In a CPU, the cache is there between the CPU proper, and the RAM.

In a write-through CPU, the RAM is the single record of truth, and the cache exists only to speed up reads of those addresses which are currently cached.

In a write-back CPU there is no single record of truth, it has to be pieced together from the RAM in combination with the cache.

In this Memcache set-up though, the database is the complete record of truth, and the cache is is only ever updated after the database write operation has completed, despite that the Memcache cache is faster than the database.

It seems unlikely a CPU would ever update the RAM without touching the cache, but the equivalent is a possibility here.

steventhedev · 4 years ago

It's sad the difference between expiry and eviction isn't explored more fully. There are plenty of posts describing cache eviction strategies that barely touch on expiry issues, and a handful of posts about cache expiry that barely touch on eviction.

Regarding the decorators, there's a reason it's popular: it saves boilerplate. The good news is that it's python, so you can do something like this:

CuriousNinja · 4 years ago

One of the examples on invalidating cache on write has the following code, which is buggy. If the DB call fails, then cache cache would have data that was never actually committed. Cache coherency is hard.

  def update_something_slow(thing_id, new_thing):
    my_cache.set_by_str(thing_id, new_thing)
    set_thing_in_a_database_layer(new_thing)

quickthrower2 · 4 years ago

I was hoping this would be about DNS, as I never know what a “good” value for those TTLs are but I’ve read in HN that high is better somewhere and handle changes some other way. I would be interested to learn more about strategies for that.

That could probably go into a whole separate article. But...

The first rule of DNS TTLs is expect some things to not respect them.

If you are changing your records frequently or urgently, set ttl somewhere between 1 minute and 5 minutes. Maybe 15 seconds if it's really a lot, but don't go below that cause things get weird.

Other than that, most things should probably be between an hour to four hours.

If you're paying per query, make ttls longer to reduce your costs. Don't make ttls longer than one or two days.

Also, don't make long chains of cnames and what not, and don't let your responses grow beyond 512 bytes (cause it won't work on some networks), allowing for NAT64 turning your A records into AAAA records. IIRC that means don't return more than 8 A records, but your mileage may vary.

JdeBP · 4 years ago

One strategy is to use DNS server software that handles changes for you. Set a time-to-die, and the software will publish a counting down TTL that is the number of seconds left until the time-to-die. Set a start time, and the software won't publish the record until that time has been reached. Have two records with each, and one has a pre-programmed switch over from one thing to another.

* http://jdebp.uk./Softwares/djbwares/guide/commands/tinydns-d...