Skip to main content

Preview — Pro guide

You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.

Neural Networks: Backpropagation, Activations & Training

Deep neural network fundamentals for FAANG ML interviews. Covers backpropagation derivation with chain rule, activation functions and their gradients, Batch/Layer Normalization, vanishing/exploding gradients, weight initialization (He/Xavier), and practical debugging. 9 hard interview questions with answers.

60 min read 3 sections 1 interview questions
BackpropagationActivation FunctionsBatch NormalizationVanishing GradientsWeight InitializationNeural NetworksDeep LearningReLUDropoutGradient FlowMLPForward Pass

Why Neural Networks Work — The Universal Approximation Theorem

A feedforward neural network with a single hidden layer and any non-linear activation function can approximate any continuous function on a compact domain to arbitrary accuracy, given enough neurons. This theorem (Cybenko 1989, Hornik 1991) tells us WHY neural networks are expressive enough to learn any mapping. What it doesn't tell us: how to find the right weights (that's gradient descent) or how many neurons you need in practice (that's architecture design).

The core computation: a neural network is a composition of affine transformations (linear layers) interleaved with non-linear activation functions. Without the non-linearity, any depth of linear layers collapses to a single linear transformation — a 10-layer network becomes equivalent to a single layer. The non-linearities allow the network to learn feature hierarchies: low layers learn edges/simple patterns, high layers learn complex abstractions.

Forward Pass — Data Flow Through a Network

Rendering diagram...
IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.