Skip to main content

Preview — Pro guide

You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.

Stream Processing Systems: Flink, Kafka Streams, Windows, and Exactly-Once

A system design deep dive on real-time stream processing architecture. Learn how to choose Flink vs Kafka Streams, design windowed aggregations, handle late data, and implement exactly-once semantics with production-grade failure recovery.

50 min read 2 sections 1 interview questions
Stream ProcessingApache FlinkKafka StreamsSpark StreamingWindowingExactly OnceEvent TimeStateful OperatorsWatermarksCheckpointing

Why Stream Processing Is a Distinct HLD Skill

Queueing systems move events; stream processors compute on infinite event flows while preserving business correctness. That distinction is why this topic appears in senior and staff HLD interviews. If a candidate treats stream processing as "just consumers reading Kafka," they miss the core challenge.

The hard part is not parsing messages. It is answering: what is the correct count or balance when events arrive late, out-of-order, or duplicated? A production system that is fast but wrong is usually worse than a slightly slower system that is auditable and correct, especially for billing, fraud, and financial reconciliation paths.

Interviewers are looking for explicit reasoning about event time vs processing time, watermark policy, checkpoint cadence, and sink commit semantics. These are not implementation details; they are correctness contracts. Saying "exactly-once" without explaining state recovery and transactional or idempotent sinks is treated as a red flag.

Staff-level depth adds evolution strategy: how you move from v1 near-real-time dashboards to v2 correctness-critical pipelines, and how you recover safely when schema changes or backfills invalidate prior assumptions.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.