Skip to main content

Preview — Pro guide

You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.

Multi-Task & Transfer Learning: Shared Representations, Negative Transfer, and Fine-Tuning Strategy

Sharing encoders across tasks can improve data efficiency — or hurt if tasks conflict. This guide covers hard parameter sharing, soft sharing (cross-stitch, sluice), adapter layers (LoRA-style intuition), negative transfer diagnostics, and when Google-style pretrain-finetune beats training from scratch on tabular vs vision vs NLP.

42 min read 2 sections 1 interview questions
Multi-Task LearningTransfer LearningFine-TuningNegative TransferAdapter LayersLoRAHard Parameter SharingRepresentation LearningDomain AdaptationPretrainingHugging FaceTask Weighting

Why Share a Model Across Tasks?

Multi-task learning (MTL) trains a joint model on multiple objectives — shared low-level representations (edges in vision, subword syntax in NLP, behavioral embeddings in recsys) can transfer statistical strength when tasks share structure.

Negative transfer occurs when joint training hurts a task compared to solo training — conflicting gradients, mismatched label noise, or capacity starvation when one task dominates the loss.

Interviews test whether you know when sharing wins and how to detect negative transfer early.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.