Skip to main content

Preview — Pro guide

You are seeing a portion of this guide. Sign in and upgrade to unlock the full article, quizzes, and interview answers.

Multi-Task Learning: Shared-Bottom, MMOE, and Negative Transfer

Every production ML ranker optimizes multiple objectives simultaneously — CTR, conversion, watch time, shares. This guide covers shared-bottom vs MMOE, negative transfer, task weighting, and the production patterns at YouTube, TikTok, Airbnb, and Meta.

25 min read 2 sections 1 interview questions
Multi-Task LearningMMOEMixture of ExpertsShared-BottomNegative TransferTask WeightingFeed RankingRecommendation SystemsMulti-Objective OptimizationGradient SurgeryYouTube DNNTikTok RankingProduction ML

Why Every Production Ranker Uses Multiple Objectives

No real product optimizes a single metric. A video recommendation ranker that optimizes click-through rate alone learns to recommend clickbait — videos with sensational thumbnails that cause users to abandon after 5 seconds. A ranker that optimizes watch time alone learns to recommend long, boring background-play content with no actual engagement.

Production ranking systems at YouTube, TikTok, Instagram, LinkedIn, and Airbnb all predict multiple outcome signals simultaneously — then combine them into a single score that reflects the full business objective. This is multi-task learning (MTL): a single model trained to predict multiple related outputs, sharing information across tasks to improve each.

The business case for MTL over separate per-task models:

  1. Data efficiency: related tasks share signal. A model learning CTR and conversion simultaneously benefits from conversion labels correcting CTR predictions and vice versa — sparse conversion labels are amplified by dense click labels.
  2. Regularization: tasks act as regularizers for each other, preventing overfit to noise in any single task's labels.
  3. Consistency: a single model scores items on all objectives simultaneously — no synchronization issues between N independent models serving at different latencies.
  4. Negative transfer awareness: multi-task architectures explicitly manage task conflict; separate models have no mechanism to reason about competing objectives.

The challenge: tasks conflict. Adding watch time as a second objective improves ranking quality for long-form content but can hurt ranking for short-form (TikTok-style) where watch-through percentage matters more than absolute duration. Task conflict causes negative transfer — adding a poorly-aligned task degrades performance on the primary task. MMOE is the architectural solution.

IMPORTANT

Premium content locked

This guide is premium content. Upgrade to Pro to unlock the full guide, quizzes, and interview Q&A.