Skip to main content

Preview — Pro guide

You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.

Machine Learning·Intermediate

Loss Functions: Choosing the Right Objective for Every ML Problem

The most underestimated ML interview topic. Covers regression losses (MSE, MAE, Huber), classification losses (cross-entropy, focal loss), ranking and embedding losses (triplet, InfoNCE), and the exact decision framework for choosing which loss to use — and why the wrong choice silently destroys model quality.

35 min read 2 sections 1 interview questions
Loss FunctionsCross-EntropyMean Squared ErrorFocal LossTriplet LossInfoNCEContrastive LossHuber LossRanking LossKL DivergenceELBOCalibration

Why Loss Function Choice Is a Design Decision, Not a Default

Most practitioners accept the loss function as a default: cross-entropy for classification, MSE for regression. This is wrong. The loss function defines what your model literally optimizes — it is the most direct expression of the business objective in mathematical form. Using the wrong loss produces a model that minimizes the metric you specified, not the outcome you wanted.

Three examples where defaults fail:

MSE for skewed targets (revenue prediction): MSE penalizes errors proportionally to their square. A $10,000 prediction error on a $1M transaction gets 10,000× more weight than a $1 error on a $1K transaction. The model memorizes the high-value tail. Use MAE or Huber loss for heavy-tailed distributions.

Cross-entropy for extreme class imbalance (fraud detection, 1% positive rate): Standard cross-entropy assigns equal weight to each example. With 99:1 imbalance, the model can achieve 99% accuracy by predicting all negatives. Use focal loss, which down-weights easy negatives and forces the model to learn from hard positives.

MSE for generative models (image reconstruction): MSE averages pixel errors independently, which produces blurry images — the model predicts the mean of possible outputs because any other prediction would be 'wrong' in more ways. Use perceptual loss or adversarial loss (GAN) for sharp generation.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.