Deep dive into Graph Neural Networks for FAANG ML interviews. Covers message passing (MPNN, Gilmer 2017), GCN (Kipf 2017), GraphSAGE (Hamilton 2017), GAT (Velickovic 2018), GIN, Graphormer, neighbor sampling for scale, 1-WL expressiveness limits, oversmoothing and oversquashing, and production systems (Pinterest PinSage, Google Maps ETA, Uber fraud). 7 hard interview questions with answers.

55 min read 3 sections 1 interview questions

Graph Neural NetworksGNNGCNGraphSAGEGATMessage PassingMPNNPinSageOversmoothingOversquashingNeighbor SamplingWeisfeiler-LemanGraphormerLink PredictionNode Classification

Why Graph Neural Networks — Structure as Inductive Bias

Most real-world ML data is not a tidy table of rows. It's a graph: a social network where users connect to users, a transaction graph where accounts send money to accounts, a molecule where atoms bond to atoms, a knowledge graph where entities link through typed edges, a road network, a code AST, a scene graph. Pinterest's pin graph has ~3B nodes and ~18B edges (Ying 2018). Uber's payment graph has millions of accounts per city. In all these, the neighbors of a node carry as much signal as the node's own features — fraud rings, collaborative filters, and protein folds all emerge from structure.

A standard MLP applied to a node's feature vector ignores the neighborhood entirely. You can hand-engineer neighbor aggregates (count of neighbors, average neighbor degree), but this is brittle and can't capture multi-hop patterns. Convolutional networks exploit grid structure via translation equivariance; graphs have no translation — so you need a different inductive bias.

GNNs are permutation-equivariant message-passing models: relabeling nodes doesn't change the prediction. This is exactly the right symmetry for graphs, the way translation equivariance is right for images. The most important mental model: a GNN is a CNN whose neighborhoods are defined by edges instead of pixel adjacency, and whose aggregation must be a permutation-invariant function (sum, mean, max, attention) because neighbors have no canonical order.

IMPORTANT

What Interviewers Evaluate on GNN Questions

A 6/10 answer describes a GNN as 'a neural network that runs on graphs' and mentions GCN and message passing. A 9/10 answer does five things:

1. States the message-passing update precisely — h_v^(l+1) = UPDATE(h_v^(l), AGGREGATE({h_u^(l) : u in N(v)})) with sum/mean/attention as concrete AGGREGATE choices and explains why permutation-invariance is non-negotiable.

2. Compares architectures with design rationale — GCN is symmetric-normalized spectral convolution; GraphSAGE samples neighbors for inductive generalization; GAT learns attention weights over neighbors; GIN uses sum + MLP to match 1-WL expressiveness. You pick one based on scale and task, not by feel.

3. Names the two fundamental failure modes — oversmoothing (node representations collapse as depth grows, so most GNNs use 2-3 layers) and oversquashing (exponential neighborhood expansion bottlenecks long-range signal through a fixed-dim vector). These are distinct, have different fixes.

4. Knows production scaling — full-batch GCN is infeasible at Pinterest scale (~3B nodes). You need neighbor sampling (GraphSAGE), subgraph sampling (GraphSAINT, Cluster-GCN), or precomputed embeddings served from Redis.

5. Cites 1-WL expressiveness limit — vanilla message passing cannot distinguish some non-isomorphic graphs (regular graphs, certain molecular structures). Fixes are positional encodings, k-WL GNNs, or graph transformers.

Candidates who mention PinSage, Google Maps ETA (Derrow-Pinion 2021), or AlphaFold 2's residue graph signal that they've read production papers, not just the 2017 GCN paper.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.

Upgrade to Pro Sign in to upgrade