Preview — Pro guide
You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.
Neural Networks: Backpropagation, Activations & Training
Deep neural network fundamentals for FAANG ML interviews. Covers backpropagation derivation with chain rule, activation functions and their gradients, Batch/Layer Normalization, vanishing/exploding gradients, weight initialization (He/Xavier), and practical debugging. 9 hard interview questions with answers.
Why Neural Networks Work — The Universal Approximation Theorem
A feedforward neural network with a single hidden layer and any non-linear activation function can approximate any continuous function on a compact domain to arbitrary accuracy, given enough neurons. This theorem (Cybenko 1989, Hornik 1991) tells us WHY neural networks are expressive enough to learn any mapping. What it doesn't tell us: how to find the right weights (that's gradient descent) or how many neurons you need in practice (that's architecture design).
The core computation: a neural network is a composition of affine transformations (linear layers) interleaved with non-linear activation functions. Without the non-linearity, any depth of linear layers collapses to a single linear transformation — a 10-layer network becomes equivalent to a single layer. The non-linearities allow the network to learn feature hierarchies: low layers learn edges/simple patterns, high layers learn complex abstractions.