Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Sections
Related Guides
Neural Networks: Backpropagation, Activations & Training
Machine Learning
Diffusion Models for Images — DDPM, Latent Diffusion, CFG, Stable Training
GenAI & Agents
Optimization & Training: SGD to AdamW, Learning Rate Scheduling, and Gradient Flow
Machine Learning
LLM Quantization: INT4/INT8, GPTQ, AWQ, and bitsandbytes
GenAI & Agents
Transformers: Self-Attention, Architecture & Modern LLMs
Machine Learning
DDPM Foundations: ELBO, Score Matching, DDIM, and CFG
ML interview theory: Ho et al. 2020 DDPM, variational lower bound, ε-prediction and L_simple, denoising score matching, DDIM fast sampling, classifier-free guidance training. Complements `genai-diffusion-models` (serving); this is the derivation-first track.
What 'Diffusion Model' Means in the 2020+ Literature
Denoising diffusion probabilistic models (DDPM) (Ho, Jain, Abbeel, NeurIPS 2020, arXiv:2006.11239) learn a generative Markov chain: a forward process adds Gaussian noise over T steps until the data is almost isotropic Gaussian, and a learned reverse process denoises step by step from noise to data. This is not the same interview object as latent diffusion product serving (VAE latents, U-Net FLOPs, NSFW filters) — those sit in genai-diffusion-models. Here you must be able to state the ELBO form, why ε-prediction is used, and how DDIM / CFG change sampling not only quality headlines.
Competing resources often show only the unconditional image story. In ML fundamentals interviews (Google / research-style), you also need the bridge to denoising score matching and why the simplified objective is a reweighted bound (Ho et al. Sec. 3.4).
What Interviewers Evaluate
Mid: forward = add noise, reverse = learn p(x_{t-1}|x_t), sample by iterating.
Senior: ELBO decomposition, ε vs x0 parameterization, connection to score matching and annealed Langevin.
Staff: why L_simple drops certain weighting, DDIM as non-Markov / accelerated sampling, CFG as conditional dropout and the w tradeoff between mode coverage and adherence — with the caveat that CFG > 1 is not a calibrated likelihood interpretation.