Everyone likes cache. How can you not? In countless cases, you can remediate performance problems quickly and cheaply by adding a cache layer to your stack. It’s like bolting a nitrous oxide system onto your tricked out street racer, or throwing dry leaves on your campfire. Immediate, obvious big performance difference, teeny-tiny investment of effort.
So everyone likes cache, and today’s web architectures are chock full of caches, from the distributed caches of CDNs to the filesystem caches on the database servers. But cache has limits; it has costs. Just as the leaves burn brightly but throw off sparks; just as the nitrous boosts an engine’s output, but can destroy the engine; so too can caching solutions do more harm than good. Or some harm, at least.
Cache Only Buys So Much
Cache generally reduces latency at the cost of memory and freshness. That is, we perform an expensive operation once, then we stick the result into RAM for a while. Subsequent accesses use the version in RAM instead of recalculating the result. This works great when the same expensive operation would otherwise be performed many times, and the result won’t change very often.
Those limitations are important, though. For example, when you insert a caching layer, there’s an assumption that the assets you are caching will be referenced many times over. But in many cases, we aggressively cache values from expensive operations, even when we’re not sure how quickly those values might be re-used. In those cases, it can be hard to decide whether caching is helping or hurting you.
Just looking at an aggregate cache hit-to-miss ratio won’t give you all the information you need, and it won’t help you determine which operations are making good use of cache and which are not. Instead, looking at the metrics on a per-operation basis can tell you which bets are more valuable — look here at how often your cache hits versus misses, the average latency of a hit versus the average latency of a miss, and maybe consider the memory footprint for caching that operation, in the event that you might be able to make better use of that memory.
In some cases, cache hits might be rare enough that the additional latency of probing and filling the cache causes more latency than it relieves. When looking at the overall performance of the site, this wouldn’t be obvious; the cache will have made an improvement in average latency.
Cold Hard Cache
Having a few underperforming operations in your cache might slow down certain use cases. More likely, it will simply fail to apply the speedup from cache in a uniform fashion — some usages will get immense benefit from cache, and some will get little to none. The decrease in average latency overall would fail to reflect the actual observed performance for some users.
A similar problem can be observed in the case of the first user. The first user for every cached value, encounters a cache miss, and consequently increased latency. When many values are cached, the same user may encounter many cache misses. The cumulative effect can render a website nigh unusable shortly after cache has been cleared.
The impact of cache misses is not exclusively on the user making the request, either. Say you have marked a handful of high-traffic methods to be cached, because those methods take a lot of processing power and/or disk IO to satisfy. Now say the cache is cleared, and those high-traffic methods now need to execute. All at once. Now even simple, straightforward requests will find themselves competing for resources: the whole system slows down.
I have personally seen this kind of problem in several applications: the application uses cache so heavily that correct operation has become dependent on having a lot of data already in cache. These applications could only have cache cleared during low traffic times. Cache clears that needed to happen during prime hours had to be targeted, and required extensive research to identify the specific list of keys that could be cleared.
This “cold cache” problem can be tamed somewhat by “warming” the cache. You warm a cache by pre-computing values for commonly-requested cache keys. Here is another place that mandates a good understanding of which individual operations see the most traffic, and the hit/miss ratio of those operations.
Cache warming is not a panacea, though. It still takes time and resources to warm the cache, and while you might be able to do it behind the scenes, at some level your database is likely to feel the additional strain.
Cache also has some complexity costs. A caching layer can make your application difficult to debug. Cache can make it very challenging to provide consistent or up-to-date data to your users. And a layer of cache is another layer of infrastructure – another process to monitor, another configuration block to maintain, or even a completely separate machine. These costs aren’t always immediately apparent at design time, but they are important because they add to your technical debt.
Let’s look at them one at a time. The simplest to understand and the most difficult to cost out is the obfuscation factor of a cache layer. Caching can give rise to bugs similar to threading bugs, where the value you are using was actually calculated in some other process, and you’re not exactly sure where it came from. It can put a great deal of logical distance between the observed bug, and the actual defect.
Take for instance a background job that looks up a user’s account data, but dummies up some request values. For most of your users, maybe those dummy values are perfectly valid assumptions, but for some small selection of users, they are not. The account data is put in cache, and so when a live request comes in asking for the account data, it pulls the improperly calculated value out of cache – sometimes. How long is it before you realize where the bad cache data is coming from?
Caches also make it difficult to know exactly when a particular result was prepared. By design, an operation is performed one time, then that result is returned to many subsequent callers. In the event conditions change such that the operation would yield a different result, it can be difficult to reflect that. Code must be written to clear the stale result from the cache, or the application must be permitted to return stale results until the cache invalidates the result itself. This can be an even more daunting problem if multiple independent caches might each have a copy of the stale result. Data freshness issues can create unanticipated situations, and can be difficult to track down. They can also lead to an inconsistent experience from user to user, which again can be challenging to replicate or explain.
And caches are separate system components. They require a resource pool of their own, configuration, monitoring, and instrumentation. They require expertise, in the form of someone who knows all about the issues outlined above, as well as issues specific to the caching systems being deployed.
From the tone and topic of this article, you might think I’m advocating against caching solutions, but that’s not really my intent. Cache is an essential tool for performance remediation, and it should be used judiciously and with full understanding of its limitations and dangers.
All too often, developers who are confronted with a performance problem (like slow Time to First Byte (TTFB) or random performance spikes) don’t know what to do. Rather than look for a better way to measure or analyze their performance, they bolt on a cache and start chucking in anything that looks like it might save them some time. And databases are just slow enough that this approach produces some pretty decent upfront results, at the cost of some pretty significant technical debt that they don’t even realize they’ve signed on to.
So I’ve outlined some of the risks and challenges involved in adding a cache layer to your application. Not because I think caching is the wrong step, but because it should be provably the right step, before you jump in.