Lesson 11: The Consistency Window — Handling Eventual Updates Gracefully

May 09, 2026

∙ Paid

What We’re Building Today

A React useOptimisticUpdate hook that immediately reflects user actions (likes, follows, retweets) in the UI before the command acknowledgment arrives from the CQRS pipeline
A reconciliation handler that detects command failures and rolls the UI back to the last confirmed projection state without visible jank
A SyncIndicator component that surfaces projection lag when the delta between optimistic state and confirmed state exceeds a configurable threshold

Why This Matters

When Slack rebuilt their message delivery infrastructure around 2017 to handle multi-workspace fan-out, they discovered a structural problem: the read model always lagged the write model by a window that varied with load. During high-traffic bursts, that window stretched from milliseconds to seconds. Their clients had to choose between blocking the UI (unacceptable) or showing stale state (confusing). Their solution — optimistic local state with server reconciliation — became the canonical approach for CQRS-backed interfaces.

NEXUS has the same exposure. Every like, follow, and repost hits a Redpanda partition, passes through a consumer that updates the SurrealDB projection, and only then becomes visible to readers. Without optimistic updates, the NEXUS UI feels broken under any real load. With them, the interface becomes self-consistent — and honest about when it isn’t.

Core Concepts

1. The Projection Lag Window

A command travels a deterministic path: client → API → Redpanda partition → projection consumer → SurrealDB read model. Each hop adds latency. Under normal load, this round-trip completes in 80–150ms. Under write bursts, the Redpanda consumer can fall behind, pushing that window to 1–3 seconds.

The UI cannot poll for confirmation — that defeats the purpose of event-driven architecture. Instead, it must maintain its own local state that assumes the command will succeed, while leaving a reconciliation path open for when it doesn’t. This is not an approximation; it is a deliberate first-class design primitive.

In NEXUS, the projection consumer emits a projection:settled event on the SSE channel with the command ID and final state. The hook subscribes to that channel and uses it as the reconciliation signal. What you give up: you now have two sources of truth during the lag window, and your UI logic must handle them as a first-class concern. What you gain: perceived latency drops to near zero.

2. Optimistic State as a Stack

The naive implementation stores a single optimistic value per entity. This breaks under rapid input — a user who likes and unlikes the same tweet three times in 500ms will create a race condition if each action overwrites the previous optimistic value.

The correct model is a pending operations stack: each fired command pushes an intent record (command ID, entity ID, delta, timestamp) onto a per-entity queue. The UI renders from the composed result of the confirmed base state plus all pending deltas. When an acknowledgment arrives, its command ID is popped from the stack. If the stack drains completely, the UI switches back to the confirmed projection — which, by that point, should match what was optimistically shown.

This design tolerates out-of-order acknowledgments and gives you a clean rollback target: remove the failed command’s delta from the stack and recompose.

// Compose optimistic state from base + pending deltas
const optimisticCount = baseState.likeCount +
  pendingOps
    .filter(op => op.entityId === tweetId && op.type === 'like')
    .reduce((acc, op) => acc + op.delta, 0);

The tradeoff: stack management adds ~2KB of runtime state per active entity. At 500 concurrently visible tweets this is 1MB — negligible. At 50,000 it becomes a concern.

3. Reconciliation on Failure

Command failures arrive as command:failed events on the same SSE channel. Reconciliation is not a rollback in the database sense — no transaction is unwound. Reconciliation is a visual correction: the failed command’s delta is removed from the pending stack, the base projection is re-fetched, and the UI re-renders with the authoritative state.

The re-fetch must be targeted. Fetching the entire tweet object for a failed like would work but is wasteful. NEXUS uses a projection fragment endpoint — GET /projection/tweet/:id/counts — that returns only the mutable counter fields. The response replaces the stale base state, the pending stack for that entity is cleared, and the displayed count snaps to ground truth.

Critically, reconciliation should not be animated. A count that jumps from 142 back to 141 on failure communicates something real to the user. Smoothing that transition would obscure the failure signal.

4. The Sync Indicator

When the pending stack for any visible entity has been non-empty for more than a configurable threshold (default: 800ms), NEXUS surfaces a sync indicator — a minimal pulsing dot in the UI chrome, not inline with the count. This is the same pattern used in collaborative document editors to signal that local changes are not yet confirmed.

The component subscribes to a syncState context that aggregates pending stack depths across all visible entities. It renders only when pendingAge > threshold — otherwise it is not in the DOM at all. A sync indicator that is permanently lit means the projection consumer is unhealthy, making it an operational signal as much as a UX one.

Continue reading this post for free, courtesy of sysdesign101.

Or purchase a paid subscription.

System Design Twitter Course