Learn · GenAI & Agents

GenAI & Agents

The stack reshaping interviews in 2026: LLMs, RAG, fine-tuning, inference optimization, agent frameworks, and evaluation — built for candidates targeting AI-first roles.

33
guides

GenAI & Agents45 min

Agent Memory Systems: In-Context, Semantic, Episodic, and Procedural

LLM context windows are finite and expensive. Learn how production agents implement hierarchical memory — in-context buffers, vector DB semantic retrieval, episodic event logs, and procedural fine-tuning — and when each layer is worth the engineering investment.

Agent MemoryMemory SystemsVector DatabaseMem0+8

Intermediate

6 questions

GenAI & Agents45 min

Instruction Tuning: Teaching LLMs to Follow Instructions

How SFT on (instruction, response) pairs transforms a base LLM into an instruction-following assistant. Covers FLAN's multi-task discovery, Alpaca self-instruct, the Lima quality-over-quantity finding, catastrophic forgetting mitigations, and when instruction tuning beats RAG.

Instruction TuningSFTFLANAlpaca+8

Intermediate

7 questions

GenAI & Agents50 min

Embeddings — From word2vec to Instruction-Tuned Vectors & Production RAG

Trace the evolution from word2vec through BERT to modern instruction-tuned embeddings (text-embedding-3, E5-mistral, BGE-M3), understand Matryoshka Representation Learning for cost-latency tradeoffs, and master the production decisions that determine retrieval quality in RAG systems.

Embeddingsword2vecBERTSentence-BERT+9

Intermediate

7 questions

GenAI & Agents45 min

Tokenization — BPE, WordPiece, SentencePiece & Production Artifacts

Master the three dominant tokenization algorithms (BPE, WordPiece, SentencePiece) used in GPT-4, BERT, and LLaMA, understand why tokenization causes subtle failures like the r-counting problem, and learn how vocabulary design directly impacts context window costs and multilingual quality.

TokenizationBPEWordPieceSentencePiece+9

Intermediate

7 questions

GenAI & Agents40 min

Decoding Strategies: Temperature, Sampling, and Constrained Generation

Master every LLM decoding parameter: greedy vs beam search vs sampling, temperature scaling, top-k and nucleus sampling, repetition penalties, and the 2024 min-p sampler. Understand when each strategy is optimal and why temperature doesn't change what the model 'knows'.

Decoding StrategiesTemperature SamplingTop-P SamplingNucleus Sampling+8

Intermediate

6 questions

GenAI & Agents45 min

Prompt Engineering: From Zero-Shot to Production Systems

Master the techniques that separate good prompts from great ones: few-shot examples, Chain-of-Thought reasoning, system prompt design, structured output, and token budget management. Understand when prompting beats fine-tuning and how to debug bad outputs systematically.

Prompt EngineeringChain-of-ThoughtFew-Shot LearningSystem Prompts+8

Intermediate

6 questions

GenAI & Agents80 min

AI Agents & Agentic Systems Framework

Comprehensive guide to building production agentic AI systems — from ReAct patterns and tool design to multi-agent orchestration, memory, and evaluation. The fastest-growing area in AI engineering.

GenAI & Agents

Agent Memory Systems: In-Context, Semantic, Episodic, and Procedural

Instruction Tuning: Teaching LLMs to Follow Instructions

Embeddings — From word2vec to Instruction-Tuned Vectors & Production RAG

Tokenization — BPE, WordPiece, SentencePiece & Production Artifacts

Decoding Strategies: Temperature, Sampling, and Constrained Generation

Prompt Engineering: From Zero-Shot to Production Systems

AI Agents & Agentic Systems Framework

LLM & Agent Evaluation: Trajectories, RAGAS, LLM-as-Judge, and Hallucination Mitigation

Multi-Agent Systems: Orchestration, LangGraph, and Production Patterns

Agentic RAG: ReAct, Self-RAG, and Multi-Step Retrieval

RLHF and DPO: Aligning LLMs with Human Preferences

LLM Fine-Tuning: LoRA, QLORA, PEFT & RLHF

Knowledge Distillation for LLMs: Logit KD, Context Distillation, and Speculative Decoding Pairing

RAFT — Retrieval-Augmented Fine-Tuning: When RAFT Beats RAG and When It Does Not

How to Approach a GenAI / LLM System Interview

How to Design GenAI Systems: From Blank Whiteboard to Production

Chain-of-Thought, Test-Time Compute & Multi-Step Reasoning

Diffusion Models for Images — DDPM, Latent Diffusion, CFG, Stable Training

Structured Output, Function & Tool Calling — JSON Schema, Strict Mode, Agent Safety

LLM Evaluation & Benchmarking — HELM, MMLU, MT-Bench, Arena, LLM-as-Judge

LLM Fundamentals — Transformers, Attention & Architecture

Long-Context LLMs — Lost in the Middle, RAG vs. Natively Long, KV Cache & Packing

Multimodal LLMs — CLIP, Vision-Language Models & Production Vision APIs

Positional Encoding — Sinusoidal, RoPE, ALiBi & Context Length Extrapolation

Advanced RAG: Hybrid Retrieval, Reranking, and Production Architecture

RAG Architecture: From Basics to Production

Vector Search for GenAI: HNSW, IVF-PQ, FAISS, and ScaNN in Production

LLM Guardrails and Safety: Input/Output Filters, Red-Teaming, and Constitutional AI

LLM Gateway: Routing, Guardrails, Quotas, and Observability for Production GenAI

LLM Observability & Monitoring: Traces, Cost and Latency SLOs, Eval Harnesses, and Alerting

LLM Serving at Scale: vLLM, KV Cache, Batching, and LLMOps

LLM Quantization: INT4/INT8, GPTQ, AWQ, and bitsandbytes

Speculative Decoding: 2-4x LLM Inference Speedup Without Quality Loss