Skip to main content
DSA·Intermediate

Heaps & Priority Queues: Top-K, Merge K Lists, and Two-Heap Patterns

Master the three heap interview patterns: top-K elements in O(n log k), merging K sorted streams in O(N log k), and the two-heap median trick. Includes Python heapq API and lazy deletion for complex problems.

35 min read 12 sections 6 interview questions
HeapPriority QueueMin HeapMax HeapTop K ElementsMerge K ListsMedian StreamheapqLazy DeletionK ClosestKth LargestTwo HeapsFrequency Map

Heap Fundamentals — Structure, Representation, and Python API

A heap is a complete binary tree satisfying the heap property: in a min-heap, every parent is ≤ its children; in a max-heap, every parent is ≥ its children. 'Complete binary tree' means all levels are fully filled except possibly the last, which fills left-to-right. This structural constraint enables O(1) array storage without pointers.

Array representation (0-indexed): for node at index i, parent is at (i-1)//2, left child at 2i+1, right child at 2i+2. Heap property: arr[(i-1)//2] <= arr[i] for all i > 0 (min-heap).

Operations: push (heappush) inserts at end and bubbles up — O(log n). pop (heappop) removes root, places last element at root, sifts down — O(log n). heapify converts an arbitrary array to a heap in O(n) — not O(n log n) as you might expect. This is because most sift-down operations work on small subtrees near the leaves.

Python's heapq is a min-heap only. To simulate a max-heap, negate all values: push -val, pop and negate the result. For tuples, Python compares element by element — (priority, item) is the canonical form for priority queues. If items are not directly comparable (e.g., objects), use (priority, counter, item) where counter is a monotonically increasing tie-breaker.

heapify runs in O(n): this is non-obvious. If you have a list of n elements and want a heap, call heapq.heapify(lst) — O(n), not O(n log n). This matters when initializing from a known collection.

How to Identify and Solve Heap Problems

01

Spot the heap signal

Heap problems involve: 'K largest / K smallest', 'K most frequent', 'find median dynamically', 'merge K sorted streams', or 'schedule tasks with priorities'. If you see K and a ranking criterion, reach for a heap.

02

Choose min-heap vs max-heap

K largest → min-heap of size K (evict smallest, keep largest). K smallest → max-heap of size K (evict largest, keep smallest). Median stream → both. In Python, heapq is always min-heap — negate values for max-heap.

03

Decide heap contents

Push tuples: (priority, tie_breaker, item). Always include a monotonic counter as tie-breaker if items are not directly comparable (ListNodes, objects). This prevents TypeError when priorities are equal.

04

Maintain size invariant

For top-K patterns: after every push, if heap size exceeds K, pop once. This keeps the heap bounded at K elements — O(log k) per operation instead of O(log n).

05

Handle stale entries with lazy deletion

If you need to 'update' a heap entry, push the new entry and mark the old one stale in a dict. When popping, skip entries that are marked stale. Never modify heap entries in-place — the heap invariant breaks.

IMPORTANT

The Counter-Intuitive Rule: Min-Heap for K Largest

Use a min-heap of size K to find K largest elements — this feels backwards but is correct.

Logic: maintain a window of the K largest elements seen so far. The smallest of these K elements sits at the heap root. When you see a new element, compare it with the root (current K-th largest). If the new element is larger, pop the root and push the new element. Otherwise discard it.

Why not a max-heap? A max-heap of all n elements gives O(n) to build, then O(k log n) to extract K elements — total O(n + k log n). A min-heap of size K gives O(n log k) — strictly better when k << n.

Rule of thumb: K largest → min-heap of size K. K smallest → max-heap of size K (negate for Python). The heap always evicts the element that would NOT be in your K-element answer.

Python heapq API — Min-Heap, Max-Heap Simulation, and Top-K Pattern

pythonheapq_patterns.py
import heapq
from typing import Any

# ── Basic heapq API ────────────────────────────────────────────────────
min_heap: list[int] = []
heapq.heappush(min_heap, 5)
heapq.heappush(min_heap, 1)
heapq.heappush(min_heap, 3)
smallest = heapq.heappop(min_heap)          # 1 — O(log n)
peek = min_heap[0]                           # 3 — O(1), no pop
heapq.heapify(min_heap)                      # in-place heapify — O(n)

# ── Max-heap via negation ──────────────────────────────────────────────
max_heap: list[int] = []
for val in [3, 1, 4, 1, 5]:
    heapq.heappush(max_heap, -val)
largest = -heapq.heappop(max_heap)           # 5

# ── Max-heap with tuples (negation trick) ─────────────────────────────
# Push (-priority, tie_breaker, item) to get max-priority first
task_heap: list[tuple[int, int, str]] = []
counter = 0
for priority, task in [(3, "low"), (10, "high"), (5, "medium")]:
    heapq.heappush(task_heap, (-priority, counter, task))
    counter += 1
_, _, next_task = heapq.heappop(task_heap)  # "high"

# ── Top K pattern: K largest elements — O(n log k) ───────────────────
def top_k_largest(nums: list[int], k: int) -> list[int]:
    """Min-heap of size k. Root = current k-th largest."""
    heap: list[int] = []
    for num in nums:
        heapq.heappush(heap, num)
        if len(heap) > k:
            heapq.heappop(heap)    # evict smallest — it can't be top-k
    return sorted(heap, reverse=True)

# ── K most frequent elements (LC 347) ─────────────────────────────────
from collections import Counter

def top_k_frequent(nums: list[int], k: int) -> list[int]:
    """O(n log k) — count frequencies, then min-heap on frequency."""
    freq = Counter(nums)
    # min-heap of (frequency, element)
    heap: list[tuple[int, int]] = []
    for num, cnt in freq.items():
        heapq.heappush(heap, (cnt, num))
        if len(heap) > k:
            heapq.heappop(heap)
    return [num for _, num in heap]

# ── heapq.nlargest / nsmallest (simpler but same complexity) ──────────
result = heapq.nlargest(3, [1, 8, 2, 5, 3])    # [8, 5, 3]
# Internally: nlargest uses a min-heap of size k — same as above

Merge K Sorted Lists — Canonical Heap Problem

Merge K sorted linked lists (LC 23) is the canonical O(N log k) heap problem and appears in real systems: merge sorted runs from K worker nodes, merge K sorted log streams, merge K sorted database partitions.

Algorithm: push the first node from each of the K lists into a min-heap as (node.val, list_index, node). Then repeatedly: pop the minimum, append it to the result, push that node's .next (if it exists). The heap always has at most K elements — one from each active list.

Time complexity: each of the N total nodes is pushed and popped once, each push/pop costs O(log k). Total: O(N log k) where N is the total number of elements across all lists and k is the number of lists.

Why this beats alternatives: merging lists two at a time sequentially costs O(N·k). Divide-and-conquer merge (pair up lists, merge pairwise, repeat) costs O(N log k) — same as the heap approach but more complex to implement under time pressure. Use the heap approach in interviews.

Critical implementation detail: Python's heap compares tuples element-by-element. If two nodes have the same .val, comparing ListNode objects raises a TypeError. Fix: use a monotonic counter as the second tuple element: (val, counter, node). The counter breaks ties and prevents comparison of ListNode objects.

Merge K Sorted Lists (LC 23)

pythonmerge_k_lists.py
import heapq
from typing import Optional

class ListNode:
    def __init__(self, val: int = 0, next: Optional['ListNode'] = None):
        self.val = val
        self.next = next

def merge_k_sorted_lists(lists: list[Optional[ListNode]]) -> Optional[ListNode]:
    """
    Merge k sorted linked lists.
    Time: O(N log k)  — N total nodes, k lists
    Space: O(k)       — heap holds at most one node per list
    """
    dummy = ListNode(0)
    tail = dummy
    heap: list[tuple[int, int, ListNode]] = []
    counter = 0   # monotonic counter to break val ties — avoids ListNode comparison

    # Seed heap with head of each non-empty list
    for node in lists:
        if node:
            heapq.heappush(heap, (node.val, counter, node))
            counter += 1

    while heap:
        val, _, node = heapq.heappop(heap)
        tail.next = node
        tail = tail.next
        if node.next:
            heapq.heappush(heap, (node.next.val, counter, node.next))
            counter += 1

    return dummy.next

Merge K Sorted Lists — Heap State Walkthrough

Rendering diagram...

Find Median from Data Stream — The Two-Heap Technique

Find Median from Data Stream (LC 295) is the hardest common heap problem. The key insight: maintain two heaps that partition the sorted data stream into a lower half and upper half. The median is derived from the roots of these heaps.

Structure:

  • lo = max-heap (Python: negate values) — stores lower half of numbers
  • hi = min-heap — stores upper half of numbers
  • Invariant: len(lo) == len(hi) or len(lo) == len(hi) + 1 (lower half has equal or one extra element)
  • lo_max ≤ hi_min always — every element in the lower half is smaller than every element in the upper half

addNum algorithm: always push to lo first (to check ordering), then rebalance. Push num to lo. Then push lo's max to hi (ensures lo_max ≤ hi_min). If len(hi) > len(lo), push hi's min back to lo.

findMedian: if heaps are equal size → (lo[0] + hi[0]) / 2. If lo is larger → lo[0] (both negated in Python: -lo_heap[0]).

The subtle bug candidates miss: you must push to lo and then balance to hi (not push directly to the 'correct' half based on value alone). Pushing directly can violate the lo_max ≤ hi_min invariant when the pushed value is between lo_max and hi_min.

Median from Data Stream (LC 295)

pythonmedian_stream.py
import heapq

class MedianFinder:
    """
    Two-heap solution.
    addNum: O(log n)
    findMedian: O(1)
    Space: O(n)
    """
    def __init__(self):
        self.lo: list[int] = []   # max-heap (negated) — lower half
        self.hi: list[int] = []   # min-heap — upper half

    def add_num(self, num: int) -> None:
        # Step 1: Always push to lo first
        heapq.heappush(self.lo, -num)

        # Step 2: Ensure lo_max <= hi_min
        # Pop max of lo, push to hi (this enforces ordering)
        heapq.heappush(self.hi, -heapq.heappop(self.lo))

        # Step 3: Rebalance sizes — lo should have equal or +1
        if len(self.hi) > len(self.lo):
            heapq.heappush(self.lo, -heapq.heappop(self.hi))

    def find_median(self) -> float:
        if len(self.lo) > len(self.hi):
            return float(-self.lo[0])   # odd count: lo has the middle
        return (-self.lo[0] + self.hi[0]) / 2.0   # even: average of two middles

# ── Example ──────────────────────────────────────────────────────────
# addNum(1): lo=[-1], hi=[]       → median=1.0
# addNum(2): lo=[-1], hi=[2]      → median=1.5
# addNum(3): lo=[-2,-1], hi=[3]   → median=2.0

Lazy Deletion Pattern — K Closest Points, Task Scheduler

pythonlazy_deletion.py
import heapq
from collections import defaultdict

# ── Why lazy deletion? ─────────────────────────────────────────────────
# Heaps don't support O(log n) arbitrary deletion (no find by value).
# When an element becomes invalid (e.g., already used), mark it as stale
# and skip it when popping. This avoids rebuilding the entire heap.

def k_closest_points(points: list[list[int]], k: int) -> list[list[int]]:
    """LC 973 — K closest points to origin. Max-heap of size k."""
    heap: list[tuple[float, int, int]] = []
    for x, y in points:
        dist = x * x + y * y          # squared distance (no sqrt needed for comparison)
        heapq.heappush(heap, (-dist, x, y))   # max-heap via negation
        if len(heap) > k:
            heapq.heappop(heap)
    return [[x, y] for _, x, y in heap]

def reorganize_string(s: str) -> str:
    """LC 767 — Interleave so no two adjacent chars are the same.
    Greedy: always place the most frequent remaining character."""
    from collections import Counter
    freq = Counter(s)
    max_heap = [(-cnt, ch) for ch, cnt in freq.items()]
    heapq.heapify(max_heap)

    result = []
    while len(max_heap) >= 2:
        cnt1, ch1 = heapq.heappop(max_heap)
        cnt2, ch2 = heapq.heappop(max_heap)
        result.extend([ch1, ch2])
        if cnt1 + 1 < 0:   # still has remaining count
            heapq.heappush(max_heap, (cnt1 + 1, ch1))
        if cnt2 + 1 < 0:
            heapq.heappush(max_heap, (cnt2 + 1, ch2))
    if max_heap:
        cnt, ch = max_heap[0]
        if -cnt > 1:
            return ""   # impossible: one char dominates
        result.append(ch)
    return "".join(result)

Classic Heap Problems — 10 Must-Know LeetCode Problems

LC#TitleHeap TypePatternTime ComplexityKey Insight
215Kth Largest ElementMin-heap size kTop-K largestO(n log k)Min-heap evicts elements smaller than k-th largest
347Top K FrequentMin-heap size kTop-K by frequencyO(n log k)Heap on (freq, element) pairs after counting
23Merge K Sorted ListsMin-heap size kK-way mergeO(N log k)Counter tie-breaker prevents ListNode comparison error
295Find Median from StreamTwo heapsLo/hi partitionO(log n) add, O(1) medianAlways insert to lo first, then rebalance to hi
973K Closest PointsMax-heap size kTop-K by distanceO(n log k)Use squared distance; max-heap evicts farthest
767Reorganize StringMax-heapGreedy frequencyO(n log n)Always pick two most frequent to avoid adjacency
1046Last Stone WeightMax-heapSimulationO(n log n)Repeat: smash two heaviest, push difference
621Task SchedulerMax-heap + counterGreedy cooldownO(n)Fill cooldown with next-most-frequent tasks
358Rearrange String k Distance ApartMax-heapGreedy windowedO(n log n)Variant of 767 with window size k instead of 2
632Smallest Range Covering K ListsMin-heapSliding window + heapO(N log k)Track current max; advance min-list pointer each step
⚠ WARNING

Common Heap Mistakes in Interviews

1. Using max-heap for K largest (backwards). The counter-intuitive truth: use a min-heap of size K to find K largest. The min-heap root is the K-th largest — anything smaller gets evicted.

2. Forgetting to negate for max-heap in Python. Push -val and negate on pop. For tuples: push (-priority, counter, item). Forgetting negation in a complex solution under time pressure is a common source of wrong answers.

3. Comparing non-comparable objects in tuples. (val, ListNode) will crash when two vals are equal. Always include a monotonic counter: (val, counter, node).

4. Modifying heap entries. You cannot update a value already in the heap in O(log n). If an entry becomes stale (e.g., in Dijkstra's algorithm), use lazy deletion: push the updated entry, and skip stale entries when popping (check if the popped distance > known shortest distance).

5. O(n log n) when O(n log k) is expected. Sorting all n elements and taking K is O(n log n). Maintaining a heap of size K is O(n log k). Interviewers expect the heap solution when k << n.

Interview Questions

Click to reveal answers
Test your knowledge

Sign in to take the Quiz

This topic has 20 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.

Sign in to take quiz →