The mindset and signal-management playbook for generative AI and large language model interviews. Covers the research-track vs production-track split, time budgeting for LLM design, recovery when math comes up, and the trap of pattern-matching RAG or fine-tuning without failure modes.

40 min read 2 sections 1 interview questions

LLM InterviewGenAI FrameworkRAGFine-TuningLoRARLHFDPOKV CachevLLMTokenizationBPERoPESpeculative DecodingPrompt Engineering

What This Page Is (and Isn't)

This is the pre-game for a generative AI / LLM interview: how to think about the conversation before you write a single prompt or sketch a retrieval pipeline. The companion page, How to Design GenAI Systems, is the execution playbook — turning a blank whiteboard into a defensible LLM application.

The reason for the split: strong engineers fail GenAI interviews not because they don't know how transformers work, but because they make the wrong meta moves — they answer "design a coding assistant" by jumping straight to "use GPT-4 with RAG" without naming retrieval failure modes, or they reach for fine-tuning when the actual constraint is latency, or they recite the attention formula without connecting it to the KV cache memory bottleneck the interviewer cares about.

GenAI interviews are also bimodal in a way that pure HLD interviews are not. Research-track loops (foundation model teams, applied research, frontier labs) want you to derive scaled dot-product attention from QKV, explain why RoPE generalises better than sinusoidal, and reason about why Chinchilla's 20-tokens-per-parameter ratio held until the inference-cost economy made over-training rational. Production-track loops (LLM platform, AI infra, applied ML at non-frontier shops) want you to size a vLLM cluster, name the latency budget for TTFT vs ITL, and explain why PagedAttention beats vanilla KV cache by 24x on concurrent request capacity. The single biggest tactical error candidates make is misreading which loop they are in and giving the wrong answer.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.

Upgrade to Pro Sign in to upgrade