Lesson 41: Cache Optimization Algorithms

Making Your Twitter Clone Lightning Fast

Jan 09, 2026

What We’re Building Today

Today we’re implementing the mathematical brain behind cache systems that power companies like Netflix and Spotify. You’ll build intelligent cache algorithms that predict what users want before they ask for it, achieving 85%+ hit rates. We’re adding LRU-K (which Instagram uses), adaptive replacement policies (like Redis), working set-based sizing (Facebook’s approach), and predictive warming algorithms that learn from user patterns.

High-Level Goals:

Implement LRU-K algorithm tracking K historical references
Build adaptive replacement cache balancing recency vs. frequency
Design cache sizing using working set theory mathematics
Create predictive warming that preloads data based on patterns
Achieve 85%+ cache hit rate with mathematically optimized eviction

Core Concepts: The Mathematics of Fast Data Access

Why Standard LRU Isn’t Enough at Scale

Basic LRU (Least Recently Used) has a fatal flaw: one-hit wonders. When a user scrolls through their timeline, they see each tweet once and never again. Standard LRU evicts actually valuable data (user profiles viewed repeatedly) to make room for temporary data (tweets scrolled past once). At Twitter scale, this costs millions in unnecessary database queries.
The One-Hit Wonder Problem: Imagine caching every tweet a user scrolls past. User views 100 tweets, all get cached, pushing out the user’s own profile data that they access 50 times per day. When they refresh, their profile needs a database query instead of a cache hit.

LRU-K: Tracking History to Predict Future

LRU-K doesn’t just track “last accessed” - it tracks the last K accesses for each item. With K=2, it remembers both the current access and the previous one. This lets it distinguish between frequently accessed data (short time between accesses) and one-time accesses (large time gap or only one access).

The Math: For K=2, we track timestamps [t1, t2]. If (t2 - t1) is small, the item is “hot” and worth keeping. If there’s only t1 with no t2, it’s probably a one-time access. Netflix uses this to keep popular show thumbnails cached while letting one-off searches expire.

Adaptive Replacement Cache (ARC): The Self-Tuning System

ARC maintains two lists: one for recently accessed items (recency) and one for frequently accessed items (frequency). The brilliant part? It automatically adjusts the split between these lists based on what’s working. If frequency-based eviction is causing more hits, ARC grows that list and shrinks the recency list.

Real-World Impact: Redis uses an ARC-inspired algorithm. When you’re reading user timelines (frequency pattern) vs. scrolling through explore page (recency pattern), ARC automatically adapts the cache split to match your traffic pattern without manual tuning.

Working Set Theory: Right-Sizing Your Cache

Not enough cache? Wasted database queries. Too much cache? Wasted memory. Working set theory gives us the mathematical answer: measure how much unique data gets accessed in a time window, then size your cache to hold exactly that amount.

The Formula: Working Set Size = unique items accessed in time window T. For Twitter, if users access their timeline every 5 minutes and each timeline has 50 tweets, you need cache for ~50 tweets × number of active users in 5-minute window. This is how Facebook sizes their timeline cache clusters.

Predictive Cache Warming: Learning User Patterns

The fastest cache hit is the one that happens before the user even asks. By analyzing access patterns, we can predict what users will request next and preload it. If a user always checks their notifications after posting a tweet, we warm the notifications cache immediately after post.

Pattern Types We Detect:

Temporal patterns: User checks timeline every morning at 8 AM
Sequential patterns: After viewing profile, 80% view recent tweets
Correlation patterns: When tweet goes viral, author’s profile gets 100x traffic

Architecture: How the Pieces Fit Together

Component Placement in Twitter System

Our optimized cache layer sits between the API Gateway and the database tier. Previous lessons built database connection pooling (Lesson 40); now we’re preventing those database calls entirely when possible. Next lesson’s network optimization (Lesson 42) will handle the wire protocol, but our cache optimization ensures there’s less data to transmit.

Flow: User Request → API Gateway → Smart Cache Layer (Today) → Database Pool (Lesson 40) → Database

Control Flow: Multi-Tier Decision Tree

L1 Check: Query LRU-K cache (in-memory, 10GB)
L2 Check: Query ARC cache (distributed Redis, 100GB)
Miss Path: Database query + update both caches
Background: Predictive warmer analyzes patterns, preloads L1/L2
Eviction: LRU-K algorithm selects victim, promotes to L2 if access count ≥ K

Data Flow: From Cold Storage to Hot Memory

Cache Write Path:

New data → Write to DB → Async write to L2 (ARC) → High-frequency items promoted to L1 (LRU-K)

Cache Read Path:

Query arrives → L1 lookup (1ms) → on miss: L2 lookup (5ms) → on miss: DB query (50ms)
Update L1/L2 with result → Record access in history tracker → Update working set stats

Predictive Warm Path:

Pattern detector analyzes access logs → Identifies correlation (90% confidence) → 
Preload predictor queues items → Background worker fetches and caches → 
Result: Next request hits L1 cache (1ms instead of 50ms)

State Changes: Item Lifecycle

States:

Cold: In database only, never cached
Warming: Queued by predictor, being fetched
L2-Cached: In distributed cache, access count < K
L1-Cached: In memory cache, access count ≥ K (hot data)
Evicting: Being removed, checking if should promote to lower tier
Expired: TTL reached, must revalidate

Transitions:

Cold → Warming: Predictor identifies pattern
Warming → L2-Cached: Fetch completes
L2-Cached → L1-Cached: Access count crosses threshold K
L1-Cached → L2-Cached: Evicted from L1 but still valuable
Any state → Cold: Data deleted or TTL expires with low access

System Design Relevance: Why This Matters at Scale

The Cost Equation

At 1,000 concurrent users with 10 requests/second each, that’s 10,000 requests/second. With 50ms database latency and 1ms cache latency:

Without optimization: 10,000 × 50ms = 500 seconds of total latency per second → need 500 database connections
With 85% cache hit rate: (10,000 × 0.85 × 1ms) + (10,000 × 0.15 × 50ms) = 8.5s + 75s = 83.5s → need only 84 connections

Savings: 416 fewer database connections, 80% reduction in infrastructure costs.

Real Production Examples

Twitter’s Cache Architecture: Twitter uses a variant of LRU-K for their timeline cache. Celebrity tweets (high frequency) stay cached even when not recently accessed, while regular users’ tweets (often one-time views) get evicted quickly. This is why you can instantly load Elon Musk’s timeline but there’s a slight delay for a random user’s profile.

Netflix’s Predictive Warming: Netflix’s cache system learns that users who watch episode 1 of a show have 95% probability of watching episode 2. The moment you finish episode 1, episode 2’s metadata and thumbnails are already warming in cache clusters nearest to you geographically. This is why the “Next Episode” button loads instantly.

Instagram’s Working Set Sizing: Instagram sizes cache clusters based on working set theory. They measure that the average user accesses ~200 unique images in a 30-minute session. With 500M daily active users and assuming 10% are active simultaneously, they provision cache for: 50M users × 200 images × average image size. This mathematical approach eliminates guesswork.

Implementation Success Criteria

By the end of this lesson, your Twitter clone will:

✓ Achieve 85%+ cache hit rate (measure with metrics dashboard) ✓ Reduce average response time from 50ms to <10ms for cached queries ✓ Demonstrate LRU-K correctly evicting one-hit wonders while keeping hot data ✓ Show ARC automatically adapting split based on traffic patterns ✓ Display working set size tracking and automatic cache size adjustments ✓ Prove predictive warming with metrics showing preloaded cache hits

What Makes This Production-Ready

Mathematical Foundation: We’re not guessing at cache sizes or eviction policies. Working set theory gives us provable optimal sizing. LRU-K’s K parameter is tuned based on measured access patterns from your actual traffic.

Adaptive Behavior: The system tunes itself. ARC adjusts recency/frequency balance automatically. Predictive warming learns new patterns without code changes. Cache sizing adapts to working set growth.

Observable System: Every cache decision is logged and measurable. You’ll see exactly why items were evicted, what patterns triggered warming, and how hit rates change with different K values.

Integration Ready: The cache layer exposes standard interfaces. Your existing API endpoints don’t change - they just get faster. Database queries remain identical, now with an intelligent cache in front.

Getting Started: Building the System

Github link:

https://github.com/sysdr/twitterdesign/tree/main/lesson41/twitter-cache-optimization

Project Setup

The implementation script creates a complete cache optimization system. Here’s what it builds:

Project Structure:

twitter-cache-optimization/
├── src/
│   ├── cache/
│   │   ├── lru-k/          # LRU-K implementation
│   │   ├── arc/            # ARC implementation
│   │   └── predictive/     # Pattern learning
│   ├── algorithms/         # Working set analyzer
│   ├── api/               # REST API server
│   └── database/          # Mock database
├── tests/                 # Unit & integration tests
├── public/               # Dashboard UI
└── build.sh              # Build automation

Running the Implementation Script

Execute the provided bash script to create the entire project:

# Make script executable
chmod +x Lesson_41_Implementation_Script.sh

# Run it
./Lesson_41_Implementation_Script.sh

The script creates everything automatically: project structure, all source files, tests, configuration, and build scripts.

Understanding the Core Algorithms

LRU-K Cache Implementation

The LRU-K algorithm tracks the last K accesses for each cached item. Here’s the decision logic:

For K=2:

Track timestamps: [previous_access, current_access]
If only one timestamp exists: probably one-hit wonder
If two timestamps and (t2 - t1) is small: frequently accessed, keep it
Evict item with oldest K-th access time

Why K=2 is optimal: With K=1 (standard LRU), you can’t distinguish between one-time and frequent access. With K=3+, you need more memory for history tracking. K=2 gives best balance.

ARC Cache Implementation

ARC maintains four lists that work together:

T1 (Recent): Items accessed once, kept because they’re recent T2 (Frequent): Items accessed multiple times, kept because they’re frequently used B1 (Ghost): Evicted from T1, tracking “we shouldn’t have removed this” B2 (Ghost): Evicted from T2, tracking “we shouldn’t have removed this”

The adaptation magic:

Target size for T1 is called ‘p’
If item found in B1: increase p (favor recency)
If item found in B2: decrease p (favor frequency)
System automatically finds optimal split

Working Set Analyzer

Measures unique items accessed in a sliding time window:

Algorithm:

Record every cache access with timestamp
Every 60 seconds, count unique keys accessed
Track these counts over time
Recommend cache size = 95th percentile of working set sizes

This mathematical approach prevents both under-provisioning (too many cache misses) and over-provisioning (wasted memory).

Predictive Warmer

Learns sequential access patterns to preload data:

Learning phase:

User accesses A, then B, then C
Record: transitions[A][B]++, transitions[B][C]++
After many users, calculate probabilities

Prediction phase:

User accesses A
If transitions[A][B] has 80% probability
Preload B into cache before user requests it
Result: instant access when they request B

Build and Test Process

Building the Project

Navigate to the created project directory:

cd ~/twitter-cache-optimization

Install dependencies and compile:

# Install Node packages
npm install

# Compile TypeScript
npm run build

Expected output:

Downloads ~30 npm packages
Creates dist/ directory with compiled JavaScript
No TypeScript compilation errors

Running Unit Tests

Verify core algorithms work correctly:

npm test

What gets tested:

LRU-K Tests:

Item storage and retrieval
K-access history tracking
Correct eviction of one-hit wonders
Hit rate calculation accuracy

ARC Tests:

Adaptation to recency-heavy workload
Adaptation to frequency-heavy workload
T1/T2 list management
Ghost list functionality

Expected results:

PASS tests/unit/LRUKCache.test.ts
  ✓ should store and retrieve items
  ✓ should evict based on LRU-K algorithm
  ✓ should track hit rate correctly

PASS tests/unit/ARCCache.test.ts
  ✓ should adapt to recency-heavy workload
  ✓ should adapt to frequency-heavy workload
  ✓ should maintain hit rate above 80%

Test Suites: 2 passed, 2 total
Tests: 6 passed, 6 total
Coverage: 85%+

Interactive Demo

Run the demo to see algorithms in action:

npm run demo

The demo simulates four realistic scenarios:

Scenario 1: LRU-K vs Standard LRU

Scrolls through 30 one-hit wonder tweets
Accesses 3 frequently viewed tweets 10 times each
Shows LRU-K keeps frequent items while evicting one-timers

Scenario 2: ARC Adaptation

Phase 1: Many unique tweets (recency pattern)
Phase 2: Same tweets repeatedly (frequency pattern)
Watch ARC automatically adjust its balance

Scenario 3: Working Set Analysis

Simulates realistic user session
Tracks unique items accessed
Recommends optimal cache size

Scenario 4: Predictive Warming

Teaches pattern: profile → tweets → likes
Demonstrates prediction accuracy
Shows latency reduction from preloading

Expected demo output:

📝 Scenario 1: Demonstrating LRU-K vs Standard LRU
✅ LRU-K Results:
   L1 Hit Rate: 65.2%
   Frequently accessed tweets kept hot ✓

📝 Scenario 2: ARC Adaptive Behavior
   Phase 1 recency bias: 72.3%
   Phase 2 recency bias: 45.1%
   Adaptations: 8 times
   ✓ Cache automatically adapted

📝 Scenario 3: Working Set Analysis
   Current size: 120 items
   Recommended: 150 items

📝 Scenario 4: Predictive Warming
   Patterns learned: 45
   Accuracy: 70%

📊 Final Performance Summary
   Overall Hit Rate: 87.3%
   Latency Reduction: 96.2%

Running the API Server

Start the development server:

npm run dev

Server starts on port 3000 with REST endpoints for testing.

Testing API Endpoints

Get a tweet (cache miss first time):

curl http://localhost:3000/api/tweets/tweet_1

Response shows ~50ms latency on first access (database query).

Get same tweet again (cache hit):

curl http://localhost:3000/api/tweets/tweet_1

Response shows ~2ms latency (L1 cache hit).

Check cache statistics:

curl http://localhost:3000/api/cache/stats

Returns comprehensive metrics:

{
  “cache”: {
    “l1”: {
      “hits”: 68,
      “misses”: 32,
      “hitRate”: 68.0
    },
    “l2”: {
      “t1Size”: 45,
      “t2Size”: 23,
      “adaptations”: 5
    },
    “predictive”: {
      “patternsLearned”: 12,
      “accuracy”: 75.5
    }
  }
}

Using the Interactive Dashboard

Open your browser to: http://localhost:3000/index.html

The dashboard provides real-time visualization of cache performance.

Dashboard Features

Top Metrics Panel:

Overall hit rate (target: >85%)
Average latency (cached vs database)
Total cache hits
Database queries saved

Cache Layer Chart:

Bar graph showing L1, L2, and database hit rates
Updates every 2 seconds
Visual comparison of tier performance

Detailed Metrics:

L1 and L2 cache sizes
ARC adaptations count
Patterns learned by predictor
Prediction accuracy percentage
Current working set size

Youtube Link:

Testing Cache Performance

Test One-Hit Wonders:

Click “Test One-Hit Wonders” button
Sends 30 sequential requests for different tweets
Watch hit rate remain low initially
Demonstrates LRU-K filtering

Test Frequent Access:

Click “Test Frequent Access” button
Accesses same 3 tweets 10 times each
Watch hit rate jump to 90%+
Shows hot data staying cached

Test ARC Adaptation:

Click “Test ARC Adaptation” button
Sends recency pattern, then frequency pattern
Watch adaptations counter increase
Observe recency bias percentage changing

Test Predictive Warming:

Click “Test Predictive Warming” button
Establishes access pattern (A→B→C)
Repeats pattern 3 times
Watch prediction accuracy increase

Understanding the Results

Interpreting Hit Rates

What the numbers mean:

85% hit rate = 85 out of 100 requests served from cache

Latency calculation:

Without cache: 100 requests × 50ms = 5000ms total
With 85% hit rate: 
  - 85 cached (2ms each) = 170ms
  - 15 database (50ms each) = 750ms
  - Total: 920ms
Improvement: 81.6% faster

Why L1 and L2 differ:

L1 (LRU-K) hits: 68%

Only hottest data
Ultra-fast access
Smaller size

L2 (ARC) hits: 81% of L1 misses

More data stored
Still fast (5ms)
Self-adapting

Combined effectiveness: 87%+ overall

Working Set Insights

Example analysis:

Current working set: 120 items Average working set: 115 items Recommended cache: 150 items (95th percentile) Current L2 size: 500 items

What this tells you:

L2 is over-provisioned by 3.3x
Could reduce to 150 items
Would save 70% memory
Same hit rate maintained

Prediction Accuracy

Typical metrics:

Patterns learned: 45 Total predictions: 230 Correct predictions: 161 Accuracy: 70%

What this means:

System identified 45 reliable patterns
Made 230 preload decisions
161 were correct (prevented cache misses)
Saved 161 database queries
30% false positives (acceptable overhead)

Docker Deployment

For containerized deployment:

# Build and start
docker-compose up --build

# Check status
docker-compose ps

# View logs
docker-compose logs -f

# Stop
docker-compose down

The Docker setup includes the API server with all cache layers and the web dashboard.

Production Tuning

Optimizing K Value

Start with K=2, then measure:

If hit rate < 80%:

Too much data being cached
Try K=3 for more selectivity

If evictions are too aggressive:

Hot data getting removed
Stay with K=2, increase cache size

Cache Size Tuning

Use working set analysis:

L1 size = working set × 0.2
L2 size = working set × 1.0

Monitor utilization:

If L1 always full: increase size
If L1 < 50% full: decrease size

Prediction Confidence

Adjust based on accuracy:

Confidence 0.5:

More predictions, some wrong
Use when cache misses are expensive

Confidence 0.7:

Balanced (recommended starting point)

Confidence 0.9:

Fewer but very accurate predictions
Use when warming has overhead

Key Takeaways

LRU-K eliminates one-hit wonder pollution by tracking access history instead of just recency. This is crucial for social media where users scroll through content once.

ARC adapts automatically to your workload without manual tuning. Whether users are browsing new content (recency) or re-reading favorite posts (frequency), ARC finds the optimal balance.

Working set theory provides mathematical cache sizing based on actual usage patterns, not guesswork. This ensures you provision exactly what’s needed.

Predictive warming exploits patterns in user behavior to eliminate cold start latency. By preloading likely-next-access items, cache hits happen before users even request data.

Multi-tier caching combines strengths of different algorithms. L1 handles the hottest data with LRU-K selectivity, L2 provides adaptive capacity with ARC, and predictive warming prepares for future accesses.

These patterns are used by Instagram for timeline caching, Netflix for content preloading, Redis for adaptive eviction, and Facebook for working set-based sizing. You’ve built the same production-grade cache optimization that powers the world’s largest platforms.

Next Steps

In Lesson 42, you’ll tackle network performance optimization - reducing the actual bytes transmitted and optimizing TCP for social media traffic patterns. The cache optimizations you built today reduce how much data needs transmission. Next lesson optimizes how efficiently that data travels across the network, completing your end-to-end performance optimization.

You’re not just learning to cache data - you’re learning to think mathematically about system resources, measure and optimize based on real behavior, and build adaptive systems that improve themselves over time. These principles apply to any distributed system at any scale.

System Design Twitter Course

Discussion about this post

Ready for more?