Skip to main content
Learn · GenAI & Agents

GenAI & Agents

The stack reshaping interviews in 2026: LLMs, RAG, fine-tuning, inference optimization, agent frameworks, and evaluation — built for candidates targeting AI-first roles.

33
guides
GenAI & Agents45 min

Agent Memory Systems: In-Context, Semantic, Episodic, and Procedural

LLM context windows are finite and expensive. Learn how production agents implement hierarchical memory — in-context buffers, vector DB semantic retrieval, episodic event logs, and procedural fine-tuning — and when each layer is worth the engineering investment.

Agent MemoryMemory SystemsVector DatabaseMem0+8
Intermediate
6 questions
GenAI & Agents45 min

Instruction Tuning: Teaching LLMs to Follow Instructions

How SFT on (instruction, response) pairs transforms a base LLM into an instruction-following assistant. Covers FLAN's multi-task discovery, Alpaca self-instruct, the Lima quality-over-quantity finding, catastrophic forgetting mitigations, and when instruction tuning beats RAG.

Instruction TuningSFTFLANAlpaca+8
Intermediate
7 questions
GenAI & Agents50 min

Embeddings — From word2vec to Instruction-Tuned Vectors & Production RAG

Trace the evolution from word2vec through BERT to modern instruction-tuned embeddings (text-embedding-3, E5-mistral, BGE-M3), understand Matryoshka Representation Learning for cost-latency tradeoffs, and master the production decisions that determine retrieval quality in RAG systems.

Embeddingsword2vecBERTSentence-BERT+9
Intermediate
7 questions
GenAI & Agents45 min

Tokenization — BPE, WordPiece, SentencePiece & Production Artifacts

Master the three dominant tokenization algorithms (BPE, WordPiece, SentencePiece) used in GPT-4, BERT, and LLaMA, understand why tokenization causes subtle failures like the r-counting problem, and learn how vocabulary design directly impacts context window costs and multilingual quality.

TokenizationBPEWordPieceSentencePiece+9
Intermediate
7 questions
GenAI & Agents40 min

Decoding Strategies: Temperature, Sampling, and Constrained Generation

Master every LLM decoding parameter: greedy vs beam search vs sampling, temperature scaling, top-k and nucleus sampling, repetition penalties, and the 2024 min-p sampler. Understand when each strategy is optimal and why temperature doesn't change what the model 'knows'.

Decoding StrategiesTemperature SamplingTop-P SamplingNucleus Sampling+8
Intermediate
6 questions
GenAI & Agents45 min

Prompt Engineering: From Zero-Shot to Production Systems

Master the techniques that separate good prompts from great ones: few-shot examples, Chain-of-Thought reasoning, system prompt design, structured output, and token budget management. Understand when prompting beats fine-tuning and how to debug bad outputs systematically.

Prompt EngineeringChain-of-ThoughtFew-Shot LearningSystem Prompts+8
Intermediate
6 questions
GenAI & Agents80 min

AI Agents & Agentic Systems Framework

Comprehensive guide to building production agentic AI systems — from ReAct patterns and tool design to multi-agent orchestration, memory, and evaluation. The fastest-growing area in AI engineering.

agentsagentic systemsLLMtool use+4
Advanced
10 questions
GenAI & Agents55 min

LLM & Agent Evaluation: Trajectories, RAGAS, LLM-as-Judge, and Hallucination Mitigation

Evaluating agents and LLM systems requires evaluating trajectories, not just outputs. Learn trajectory vs outcome evaluation, tool call accuracy, the GAIA benchmark gap (humans: 92%, best agents: ~50%), LLM-as-judge biases, RAGAS metrics for grounded generation, layered hallucination defense, non-deterministic CI gates, and how production teams run shadow evaluation and regression suites.

Agent EvaluationLLM EvaluationGAIA BenchmarkLLM-as-Judge+11
Advanced
7 questions
GenAI & Agents55 min

Multi-Agent Systems: Orchestration, LangGraph, and Production Patterns

Single agents hit context limits and accumulate errors on complex tasks. Learn orchestrator-worker architectures, LangGraph state machines, AutoGen debate patterns, parallelization, and why most multi-agent demos break beyond 5 steps in production.

Multi-AgentLangGraphAutoGenOrchestrator+8
Advanced
1 questions
GenAI & Agents50 min

Agentic RAG: ReAct, Self-RAG, and Multi-Step Retrieval

Single-shot RAG fails on multi-hop questions and self-correction. Master agentic RAG patterns — ReAct loops, Self-RAG reflection tokens, FLARE, and tool-augmented retrieval — with latency budgets and failure modes every FAANG candidate must know.

RAGAgentic RAGReActSelf-RAG+8
Advanced
1 questions
GenAI & Agents55 min

RLHF and DPO: Aligning LLMs with Human Preferences

How RLHF transforms a base LLM into a helpful, harmless assistant — and why DPO has largely replaced PPO for this task. Covers reward model training, PPO instability and reward hacking, Constitutional AI, and when PPO still wins.

RLHFDPOPPOReward Model+8
Advanced
1 questions
GenAI & Agents55 min

LLM Fine-Tuning: LoRA, QLORA, PEFT & RLHF

How to adapt pre-trained LLMs for specific tasks without catastrophic forgetting. Covers full fine-tuning vs PEFT, LoRA math and implementation, QLoRA for consumer hardware, instruction tuning, RLHF with PPO, DPO as the modern alternative, and when fine-tuning actually helps vs. when RAG or prompting is better.

LoRAQLoRAPEFTRLHF+5
Advanced
1 questions
GenAI & Agents46 min

Knowledge Distillation for LLMs: Logit KD, Context Distillation, and Speculative Decoding Pairing

Classical KD minimized KL between student and teacher logits on a fixed dataset; LLM-era variants distill **reasoning traces**, **tool-use format**, or **chain-of-thought** into smaller models — or pair student draft models with teacher verification in speculative decoding. Covers when offline KD beats RLHF, sequence-level distillation pitfalls, and latency-quality tradeoffs for on-device assistants.

Knowledge DistillationLLM CompressionLogit DistillationContext Distillation+8
Advanced
1 questions
GenAI & Agents40 min

RAFT — Retrieval-Augmented Fine-Tuning: When RAFT Beats RAG and When It Does Not

RAFT (Stanford / industry follow-ons) trains models to ignore distractor documents and cite the right passages — closing the gap where vanilla RAG retrieves noise and the model hedges. This guide covers the distractor-augmented training recipe, comparison to supervised fine-tune without retrieval, evaluation on open-book QA, and failure modes when your doc corpus drifts faster than retrain cadence.

RAFTRetrieval-Augmented Fine-TuningRAGFine-Tuning+8
Advanced
1 questions
GenAI & Agents40 min

How to Approach a GenAI / LLM System Interview

The mindset and signal-management playbook for generative AI and large language model interviews. Covers the research-track vs production-track split, time budgeting for LLM design, recovery when math comes up, and the trap of pattern-matching RAG or fine-tuning without failure modes.

LLM InterviewGenAI FrameworkRAGFine-Tuning+10
Advanced
1 questions
GenAI & Agents45 min

How to Design GenAI Systems: From Blank Whiteboard to Production

The mechanical playbook for designing LLM applications in interviews. Covers the prompt-vs-RAG-vs-fine-tune decision tree, RAG pipeline (chunking, embeddings, FAISS, reranking), LoRA, the vLLM serving stack (KV cache, PagedAttention), and reference architectures for code assistants and document QA.

GenAI DesignRAGFAISSHNSW+10
Advanced
1 questions
GenAI & Agents90 min

Chain-of-Thought, Test-Time Compute & Multi-Step Reasoning

From CoT and self-consistency to tree search over verbal states and modern reasoning-tuned models. When step-by-step prompts help, when they add latency and variance, and how to evaluate and ship reasoning in production (verifiers, judges, routing) without leaking competitive or customer detail.

Chain of ThoughtCoTSelf-ConsistencyTree of Thoughts+10
Advanced
8 questions
GenAI & Agents95 min

Diffusion Models for Images — DDPM, Latent Diffusion, CFG, Stable Training

How denoising diffusion and latent diffusion power modern image gen (DALL·E, Stable Diffusion class systems): forward noise, score matching, DDIM-style fast sampling, classifier-free guidance, and production concerns — VRAM, latency, safety filters, and eval (FID, CLIP score, red-team). Connects the five GenAI planes for *generation-first* (non-LLM) stacks.

DDPMDDIMScore MatchingClassifier-Free Guidance+11
Advanced
1 questions
GenAI & Agents90 min

Structured Output, Function & Tool Calling — JSON Schema, Strict Mode, Agent Safety

How tool calling actually works in production — OpenAI-style function tools, JSON Schema constraints, strict structured outputs, parallel vs. sequential tools, and the auth/idempotency story interviewers expect. Connects the five GenAI planes from gateway design to eval and incident response.

Function CallingTool UseJSON SchemaStructured Output+10
Advanced
1 questions
GenAI & Agents95 min

LLM Evaluation & Benchmarking — HELM, MMLU, MT-Bench, Arena, LLM-as-Judge

How to evaluate foundation and chat models without fooling yourself — HELM’s multi-metric design, instruction-following suites, chat leaderboards, RAGAS for grounded systems, contamination, and when human eval still wins. Connects public benchmarks to a private canary stack you can ship against.

HELMMMLUMT-BenchChatbot Arena+10
Advanced
1 questions
GenAI & Agents100 min

LLM Fundamentals — Transformers, Attention & Architecture

Deep understanding of how large language models work — from self-attention and the transformer architecture to modern optimizations (KV cache, Flash Attention, RoPE, GQA). Essential for senior AI/ML engineer interviews.

TransformersAttention MechanismLLMArchitecture+8
Advanced
1 questions
GenAI & Agents90 min

Long-Context LLMs — Lost in the Middle, RAG vs. Natively Long, KV Cache & Packing

What 32K–1M+ token support really means: attention and KV cache economics, the lost-in-the-middle result (Liu et al., TACL 2024, arXiv:2307.03172), needle-in-haystack and chunk reordering for RAG, and when retrieval still wins on cost and proof. Ties the five GenAI planes for staff interviews.

Long ContextLost in the MiddleRAGKV Cache+11
Advanced
1 questions
GenAI & Agents95 min

Multimodal LLMs — CLIP, Vision-Language Models & Production Vision APIs

How image+text models work at scale — contrastive pretraining, projection layers, and LLaVA-style instruction tuning. Covers evaluation (MMMU, VQA, retrieval), latency and token economics, and failure modes interviewers expect you to name (hallucinated objects, OCR brittleness, eval contamination).

CLIPSigLIPVision EncoderLLaVA+9
Advanced
1 questions
GenAI & Agents45 min

Positional Encoding — Sinusoidal, RoPE, ALiBi & Context Length Extrapolation

Deep-dive into why self-attention is permutation-invariant and how sinusoidal, learned, RoPE, and ALiBi positional encodings solve this — with production guidance on context length extrapolation, YaRN scaling, and why RoPE is now the default for new LLM architectures.

Positional EncodingRoPEALiBiSinusoidal+8
Advanced
1 questions
GenAI & Agents60 min

Advanced RAG: Hybrid Retrieval, Reranking, and Production Architecture

Go beyond naive vector search: master hybrid retrieval with BM25 and dense embeddings fused via Reciprocal Rank Fusion, two-stage reranking with cross-encoders, HyDE, RAPTOR, and the RAG vs fine-tuning decision framework. Includes production failure modes and RAGAS evaluation.

RAGHybrid RetrievalBM25Reranking+10
Advanced
6 questions
GenAI & Agents55 min

RAG Architecture: From Basics to Production

Retrieval-Augmented Generation is the most common GenAI system design topic. Master chunking strategies, embedding models, vector databases, hybrid search, reranking, advanced retrieval patterns (HyDE, RAPTOR), agentic RAG, guardrails, and production evaluation with RAGAS.

RAGVector DatabaseEmbeddingsChunking+6
Advanced
1 questions
GenAI & Agents42 min

Vector Search for GenAI: HNSW, IVF-PQ, FAISS, and ScaNN in Production

Standalone deep dive on vector search systems for GenAI workloads. Learn how HNSW, IVF, IVF-PQ, and ScaNN differ on recall-latency-cost, how to tune parameters like efSearch and nprobe, and how to choose the right index for million-to-billion scale retrieval.

Vector SearchHNSWIVF-PQFAISS+6
Advanced
1 questions
GenAI & Agents38 min

LLM Guardrails and Safety: Input/Output Filters, Red-Teaming, and Constitutional AI

Production LLM systems require multi-layer safety mechanisms: prompt injection defenses, content classifiers, PII detection, output moderation, and red-teaming pipelines. This guide covers the defense-in-depth safety architecture used at OpenAI, Anthropic, Meta, and Google — the techniques increasingly tested in AI engineer interviews at companies building LLM-powered products.

LLM SafetyGuardrailsPrompt InjectionContent Moderation+8
Advanced
1 questions
GenAI & Agents42 min

LLM Gateway: Routing, Guardrails, Quotas, and Observability for Production GenAI

LLM Gateway is the control plane between applications and model providers. Learn architecture patterns for model routing, rate limiting, policy enforcement, prompt/response filtering, caching, fallback handling, and cost governance at scale.

LLM GatewayModel RoutingPrompt FirewallRate Limiting+6
Advanced
1 questions
GenAI & Agents44 min

LLM Observability & Monitoring: Traces, Cost and Latency SLOs, Eval Harnesses, and Alerting

Traditional APM misses token streams, tool loops, and judge drift. This guide covers OpenTelemetry-style traces for LLM chains, LangSmith / Phoenix-style eval sessions, cost attribution per tenant, latency SLOs by model tier, golden-set regression, and production alerts for schema drift in structured outputs.

LLM ObservabilityOpenTelemetryLangSmithArize Phoenix+8
Advanced
1 questions
GenAI & Agents60 min

LLM Serving at Scale: vLLM, KV Cache, Batching, and LLMOps

The engineering behind serving large language models at high throughput and low latency. Covers prefill/decode distinction, KV cache memory math, PagedAttention, continuous batching, speculative decoding, Flash Attention, MQA/GQA, quantization, and the LLMOps discipline (versioning, release gates, canary, rollback) needed to deploy them safely.

vLLMKV CachePagedAttentionBatching+10
Advanced
1 questions
GenAI & Agents50 min

LLM Quantization: INT4/INT8, GPTQ, AWQ, and bitsandbytes

How to compress LLMs from 140GB to 35GB without destroying quality. Covers PTQ vs QAT, INT8 absmax/zero-point methods, GPTQ Hessian-based INT4, AWQ salient-weight protection, bitsandbytes mixed-precision, and the calibration dataset trap most engineers miss.

QuantizationGPTQAWQbitsandbytes+10
Advanced
1 questions
GenAI & Agents40 min

Speculative Decoding: 2-4x LLM Inference Speedup Without Quality Loss

How speculative decoding exploits GPU underutilization in the decode phase to achieve 2-4x speedup with mathematically guaranteed output distribution equivalence. Covers draft-verify mechanics, acceptance probability, Medusa heads, Lookahead decoding, and the non-obvious constraints around tokenizer matching.

Speculative DecodingLLM InferenceAutoregressive DecodingDraft Model+10
Advanced
1 questions