Official name of effect describe below: Cache Hammering.
Some background info:
To rebuild cache firstly user_a deletes it. Then first user to ask for this missing cache (not necessarily user_a, that initially deleted a cache) will build it.
All seems nice and pretty until user_b visits a website, while user_a is still building a cache. Based on logic, described above user_b will too start cache building process in PARALLEL with user_a. This will indefinitely continue to happen with each new user visiting website in time, until user_a completes cache building process. But user_a won't be able to build cache as quickly as he planned, since other users, who are building same cache in PARALLEL will slow him down.
This way due parallel calculations server load will raise exponentially. For a dedicated servers this might not be that big problem, but for shared hosting this could lead to whole server shutdown.
Here are preconditions, that can cause exponential server load I've explain above:
- In-Commerce module installed
- Unit Config Cache build time - 5 seconds
- 5+ RPS (request per second to the site)
Concept of fixing:
There are 2 states in which each of caches could be:
- we have cache, but it's outdated and needs to be rebuild
- we don't have any cache and we need to build from scratch
Here is what I propose:
- in case, when we have outdated cache, then let user_a rebuild cache, while other users would use outdated cache version
- in case, when we don't have cache, then let user_a rebuild cache, while other users will be waiting (predefined amount of seconds) for him to finish and then use cache, when it's ready
To implement proposed idea we always need a way to get outdated cache version to return to other users, while user_a is building.
This is now always possible due current cache key automatic expiration scheme. For example, cache key "sample_key[%LangSerial%]" (that automatically expire on LangSerial cache key change) would be stored in cache under name "sample_key[%LangSerial:1%]" (added ":1"). This way, when LangSerial cache key will be changed, then key name (in cache) will be different and that cache with previous name sort-of expires (since nobody will know how to access it). This works well, but we don't have a way to get old cache key name to return all users, except one, that is building new cache.
To solve this issue I'm proposing to store additional cache key with each cache key stored and don't replace any serial cache keys (ones between "[%" and "%]") within cache key name. That additional cache key will hold variable part of cache key. This way original cache key will always be the same, but expiration fact could be detected by comparing at cached and current additional cache key value.
- key: "sample_key[%LangSerial%]" (actually stored key is: "sample_key[%LangSerial:1%]"), value: "some cached data"
- key: "sample_key[%LangSerial%]" (actually stored key is: "sample_key[%LangSerial%]"), value: "some cached data"
- key: "sample_key[%LangSerial%]_serials" ("_serials" added to original cache key name), value: "sample_key[%LangSerial:1%]"
To implement described scheme we need:
- make "getCache" method to wait for cache (if it's totally missing) or return outdated cache (when cache is build by other user)
- make "setCache" method reset any cache building indicators (set by rebuildCache method, see below)
- create "rebuildCache" method, that will allow to indicate, that:
- cache will be rebuild right away (e.g. set "<cache_key>_rebuilding" cache key, so other users will know, that somebody is rebuilding cache)
- cache must be rebuild on next user visit (e.g. set "<cache_key>_rebuild" cache key, so next user will know that cache must be rebuild)
I'll create a task (with a patch for 5.2.x branch) and attach new getCache method block diagram shortly.