Async messaging is the backbone of every scalable architecture. Master the queue-vs-log distinction, Kafka partitioning and consumer groups, delivery guarantees, ordering semantics, and the production failure modes that separate staff engineers from the crowd.

55 min read 3 sections 1 interview questions

Message QueuesKafkaRabbitMQAWS SQSAWS KinesisPulsarRedis StreamsConsumer GroupsDelivery SemanticsPartitioningDead Letter QueueConsumer LagIdempotencyEvent Streaming

Why Async Messaging Exists and When NOT to Use It

Synchronous request/response couples the caller's fate to the callee's: if the callee is down, slow, or overloaded, the caller pays the price. Async messaging decouples producers from consumers so that short spikes, slow consumers, and transient failures don't cascade into a caller-side outage. Three concrete benefits:

Decoupling. Producer writes to a broker; it doesn't know or care which consumers exist. Add a new consumer (analytics, search indexer, fraud detector) without touching the producer.
Buffering bursts. Black Friday traffic is 10x normal. Sync backends melt; a queue absorbs the burst and drains it at the consumer's steady rate.
Fan-out. One event becomes many downstream reactions. A user_signed_up event drives welcome email, CRM sync, recommendation cold-start, and anti-abuse scoring — all independently, all in parallel.

When NOT to use a queue: synchronous read-your-writes paths, low-latency serving paths where the user is waiting (<100ms budget), and anywhere the caller needs the callee's return value. If your checkout API needs the payment authorization result before responding, you cannot put the payment call through a queue. Async works when the caller can fire-and-forget.

The mental model staff engineers use: queues are not a substitute for reliable sync calls; they are a different tool for a different problem. Use queues for work that can be done eventually, use sync for work the caller needs done now. Mixing the two incorrectly — e.g., making a synchronous API hang on a queue result — is the #1 architecture smell in this space.

IMPORTANT

What Interviewers Are Actually Testing

Queue and streaming questions probe four competencies:

Can you explain the queue-vs-log distinction cleanly? The single biggest conceptual gap in candidates. SQS and RabbitMQ are queues; Kafka and Kinesis are logs. If you do not understand why a log is fundamentally different — messages are retained and replayable, multiple independent consumer groups read the same data — you cannot reason about modern event-driven architecture.
Do you understand partitioning and ordering? Kafka ordering is per-partition, not global. Poor partitioning (one hot key, unbounded keys) is the #1 production failure mode.
Delivery semantics reasoning. "Exactly-once" is a system property built from idempotent producer + transactions + idempotent consumer. Candidates who say "just use exactly-once mode" without explaining the primitives are not ready for staff.
Can you diagnose consumer lag? Lag is the canonical SRE metric for streaming systems. If you cannot reason about why lag grows (slow consumer, rebalance storm, partition hotspot) you cannot operate these systems.

The anti-signal: saying "use Kafka" as a default without knowing when SQS is the better choice, or conflating "topic" with "partition," or confusing consumer group offsets with ACKs.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.

Upgrade to Pro Sign in to upgrade