Performance Analysis of the Randomized SIEVE/CLOCK Cache Replacement Algorithm (SIGMETRICS'26)
We introduce a randomized, scan-resistant variant of SIEVE/CLOCK that uses a single circular array and only ⌈log2(K+1)⌉ ≥ 1 access bits per cached item. We also present a heterogeneous mean-field approximation of its performance.
Abstract
SIEVE and CLOCK have attracted significant interest as cache replacement algorithms due to their simplicity in implementation and, in the case of CLOCK, frugality in memory usage. Unfortunately, they are susceptible to pathologies such as \emph{performance cliffs}, wherein increasing cache capacity has little effect over a range of sizes until a workload-dependent threshold is crossed, after which hit rate improves abruptly and disproportionately.
Performance cliffs arise precisely because scan-based eviction paths create a synchronized, deterministic aging frontier in which many objects are admitted and removed in lockstep, causing catastrophic performance degradation under scan-like workloads.
This observation motivates a design principle for cache replacement: \emph{randomize the eviction path to mitigate worst-case scans while keeping the per-object state minimal}.
Intuitively, randomization mitigates performance cliffs by dispersing eviction pressure across objects; from a queueing-theoretic perspective, this dispersion is akin to a per-object birth-death process.
We thus introduce a randomized SIEVE/CLOCK variant that uses $\lceil \log_2 (K+1) \rceil \geq 1$ access bits per cached item; it is as cheap to implement as CLOCK using a single circular array, yet remains robust to scans and amenable to a natural CTMC description, allowing accurate performance approximation via a heterogeneous mean-field model.
Compared with popular baselines and their generalizations, the randomized SIEVE/CLOCK achieves hit rates of $1.5\times$ or more on production workloads with long scan sequences. Furthermore, though the cache hit rate improves with the number of access bits, we show that the majority of the gain is already achieved with as few as $4$ access bits per cached item.
PDF.