Heaps & Priority Queues: Top-K, Merge K Lists, and Two-Heap Patterns
Master the three heap interview patterns: top-K elements in O(n log k), merging K sorted streams in O(N log k), and the two-heap median trick. Includes Python heapq API and lazy deletion for complex problems.
Heap Fundamentals — Structure, Representation, and Python API
A heap is a complete binary tree satisfying the heap property: in a min-heap, every parent is ≤ its children; in a max-heap, every parent is ≥ its children. 'Complete binary tree' means all levels are fully filled except possibly the last, which fills left-to-right. This structural constraint enables O(1) array storage without pointers.
Array representation (0-indexed): for node at index i, parent is at (i-1)//2, left child at 2i+1, right child at 2i+2. Heap property: arr[(i-1)//2] <= arr[i] for all i > 0 (min-heap).
Operations: push (heappush) inserts at end and bubbles up — O(log n). pop (heappop) removes root, places last element at root, sifts down — O(log n). heapify converts an arbitrary array to a heap in O(n) — not O(n log n) as you might expect. This is because most sift-down operations work on small subtrees near the leaves.
Python's heapq is a min-heap only. To simulate a max-heap, negate all values: push -val, pop and negate the result. For tuples, Python compares element by element — (priority, item) is the canonical form for priority queues. If items are not directly comparable (e.g., objects), use (priority, counter, item) where counter is a monotonically increasing tie-breaker.
heapify runs in O(n): this is non-obvious. If you have a list of n elements and want a heap, call heapq.heapify(lst) — O(n), not O(n log n). This matters when initializing from a known collection.
How to Identify and Solve Heap Problems
Spot the heap signal
Heap problems involve: 'K largest / K smallest', 'K most frequent', 'find median dynamically', 'merge K sorted streams', or 'schedule tasks with priorities'. If you see K and a ranking criterion, reach for a heap.
Choose min-heap vs max-heap
K largest → min-heap of size K (evict smallest, keep largest). K smallest → max-heap of size K (evict largest, keep smallest). Median stream → both. In Python, heapq is always min-heap — negate values for max-heap.
Decide heap contents
Push tuples: (priority, tie_breaker, item). Always include a monotonic counter as tie-breaker if items are not directly comparable (ListNodes, objects). This prevents TypeError when priorities are equal.
Maintain size invariant
For top-K patterns: after every push, if heap size exceeds K, pop once. This keeps the heap bounded at K elements — O(log k) per operation instead of O(log n).
Handle stale entries with lazy deletion
If you need to 'update' a heap entry, push the new entry and mark the old one stale in a dict. When popping, skip entries that are marked stale. Never modify heap entries in-place — the heap invariant breaks.
The Counter-Intuitive Rule: Min-Heap for K Largest
Use a min-heap of size K to find K largest elements — this feels backwards but is correct.
Logic: maintain a window of the K largest elements seen so far. The smallest of these K elements sits at the heap root. When you see a new element, compare it with the root (current K-th largest). If the new element is larger, pop the root and push the new element. Otherwise discard it.
Why not a max-heap? A max-heap of all n elements gives O(n) to build, then O(k log n) to extract K elements — total O(n + k log n). A min-heap of size K gives O(n log k) — strictly better when k << n.
Rule of thumb: K largest → min-heap of size K. K smallest → max-heap of size K (negate for Python). The heap always evicts the element that would NOT be in your K-element answer.
Python heapq API — Min-Heap, Max-Heap Simulation, and Top-K Pattern
import heapq
from typing import Any
# ── Basic heapq API ────────────────────────────────────────────────────
min_heap: list[int] = []
heapq.heappush(min_heap, 5)
heapq.heappush(min_heap, 1)
heapq.heappush(min_heap, 3)
smallest = heapq.heappop(min_heap) # 1 — O(log n)
peek = min_heap[0] # 3 — O(1), no pop
heapq.heapify(min_heap) # in-place heapify — O(n)
# ── Max-heap via negation ──────────────────────────────────────────────
max_heap: list[int] = []
for val in [3, 1, 4, 1, 5]:
heapq.heappush(max_heap, -val)
largest = -heapq.heappop(max_heap) # 5
# ── Max-heap with tuples (negation trick) ─────────────────────────────
# Push (-priority, tie_breaker, item) to get max-priority first
task_heap: list[tuple[int, int, str]] = []
counter = 0
for priority, task in [(3, "low"), (10, "high"), (5, "medium")]:
heapq.heappush(task_heap, (-priority, counter, task))
counter += 1
_, _, next_task = heapq.heappop(task_heap) # "high"
# ── Top K pattern: K largest elements — O(n log k) ───────────────────
def top_k_largest(nums: list[int], k: int) -> list[int]:
"""Min-heap of size k. Root = current k-th largest."""
heap: list[int] = []
for num in nums:
heapq.heappush(heap, num)
if len(heap) > k:
heapq.heappop(heap) # evict smallest — it can't be top-k
return sorted(heap, reverse=True)
# ── K most frequent elements (LC 347) ─────────────────────────────────
from collections import Counter
def top_k_frequent(nums: list[int], k: int) -> list[int]:
"""O(n log k) — count frequencies, then min-heap on frequency."""
freq = Counter(nums)
# min-heap of (frequency, element)
heap: list[tuple[int, int]] = []
for num, cnt in freq.items():
heapq.heappush(heap, (cnt, num))
if len(heap) > k:
heapq.heappop(heap)
return [num for _, num in heap]
# ── heapq.nlargest / nsmallest (simpler but same complexity) ──────────
result = heapq.nlargest(3, [1, 8, 2, 5, 3]) # [8, 5, 3]
# Internally: nlargest uses a min-heap of size k — same as above
Merge K Sorted Lists — Canonical Heap Problem
Merge K sorted linked lists (LC 23) is the canonical O(N log k) heap problem and appears in real systems: merge sorted runs from K worker nodes, merge K sorted log streams, merge K sorted database partitions.
Algorithm: push the first node from each of the K lists into a min-heap as (node.val, list_index, node). Then repeatedly: pop the minimum, append it to the result, push that node's .next (if it exists). The heap always has at most K elements — one from each active list.
Time complexity: each of the N total nodes is pushed and popped once, each push/pop costs O(log k). Total: O(N log k) where N is the total number of elements across all lists and k is the number of lists.
Why this beats alternatives: merging lists two at a time sequentially costs O(N·k). Divide-and-conquer merge (pair up lists, merge pairwise, repeat) costs O(N log k) — same as the heap approach but more complex to implement under time pressure. Use the heap approach in interviews.
Critical implementation detail: Python's heap compares tuples element-by-element. If two nodes have the same .val, comparing ListNode objects raises a TypeError. Fix: use a monotonic counter as the second tuple element: (val, counter, node). The counter breaks ties and prevents comparison of ListNode objects.
Merge K Sorted Lists (LC 23)
import heapq
from typing import Optional
class ListNode:
def __init__(self, val: int = 0, next: Optional['ListNode'] = None):
self.val = val
self.next = next
def merge_k_sorted_lists(lists: list[Optional[ListNode]]) -> Optional[ListNode]:
"""
Merge k sorted linked lists.
Time: O(N log k) — N total nodes, k lists
Space: O(k) — heap holds at most one node per list
"""
dummy = ListNode(0)
tail = dummy
heap: list[tuple[int, int, ListNode]] = []
counter = 0 # monotonic counter to break val ties — avoids ListNode comparison
# Seed heap with head of each non-empty list
for node in lists:
if node:
heapq.heappush(heap, (node.val, counter, node))
counter += 1
while heap:
val, _, node = heapq.heappop(heap)
tail.next = node
tail = tail.next
if node.next:
heapq.heappush(heap, (node.next.val, counter, node.next))
counter += 1
return dummy.next
Merge K Sorted Lists — Heap State Walkthrough
Find Median from Data Stream — The Two-Heap Technique
Find Median from Data Stream (LC 295) is the hardest common heap problem. The key insight: maintain two heaps that partition the sorted data stream into a lower half and upper half. The median is derived from the roots of these heaps.
Structure:
lo= max-heap (Python: negate values) — stores lower half of numbershi= min-heap — stores upper half of numbers- Invariant:
len(lo) == len(hi)orlen(lo) == len(hi) + 1(lower half has equal or one extra element) lo_max ≤ hi_minalways — every element in the lower half is smaller than every element in the upper half
addNum algorithm: always push to lo first (to check ordering), then rebalance. Push num to lo. Then push lo's max to hi (ensures lo_max ≤ hi_min). If len(hi) > len(lo), push hi's min back to lo.
findMedian: if heaps are equal size → (lo[0] + hi[0]) / 2. If lo is larger → lo[0] (both negated in Python: -lo_heap[0]).
The subtle bug candidates miss: you must push to lo and then balance to hi (not push directly to the 'correct' half based on value alone). Pushing directly can violate the lo_max ≤ hi_min invariant when the pushed value is between lo_max and hi_min.
Median from Data Stream (LC 295)
import heapq
class MedianFinder:
"""
Two-heap solution.
addNum: O(log n)
findMedian: O(1)
Space: O(n)
"""
def __init__(self):
self.lo: list[int] = [] # max-heap (negated) — lower half
self.hi: list[int] = [] # min-heap — upper half
def add_num(self, num: int) -> None:
# Step 1: Always push to lo first
heapq.heappush(self.lo, -num)
# Step 2: Ensure lo_max <= hi_min
# Pop max of lo, push to hi (this enforces ordering)
heapq.heappush(self.hi, -heapq.heappop(self.lo))
# Step 3: Rebalance sizes — lo should have equal or +1
if len(self.hi) > len(self.lo):
heapq.heappush(self.lo, -heapq.heappop(self.hi))
def find_median(self) -> float:
if len(self.lo) > len(self.hi):
return float(-self.lo[0]) # odd count: lo has the middle
return (-self.lo[0] + self.hi[0]) / 2.0 # even: average of two middles
# ── Example ──────────────────────────────────────────────────────────
# addNum(1): lo=[-1], hi=[] → median=1.0
# addNum(2): lo=[-1], hi=[2] → median=1.5
# addNum(3): lo=[-2,-1], hi=[3] → median=2.0
Lazy Deletion Pattern — K Closest Points, Task Scheduler
import heapq
from collections import defaultdict
# ── Why lazy deletion? ─────────────────────────────────────────────────
# Heaps don't support O(log n) arbitrary deletion (no find by value).
# When an element becomes invalid (e.g., already used), mark it as stale
# and skip it when popping. This avoids rebuilding the entire heap.
def k_closest_points(points: list[list[int]], k: int) -> list[list[int]]:
"""LC 973 — K closest points to origin. Max-heap of size k."""
heap: list[tuple[float, int, int]] = []
for x, y in points:
dist = x * x + y * y # squared distance (no sqrt needed for comparison)
heapq.heappush(heap, (-dist, x, y)) # max-heap via negation
if len(heap) > k:
heapq.heappop(heap)
return [[x, y] for _, x, y in heap]
def reorganize_string(s: str) -> str:
"""LC 767 — Interleave so no two adjacent chars are the same.
Greedy: always place the most frequent remaining character."""
from collections import Counter
freq = Counter(s)
max_heap = [(-cnt, ch) for ch, cnt in freq.items()]
heapq.heapify(max_heap)
result = []
while len(max_heap) >= 2:
cnt1, ch1 = heapq.heappop(max_heap)
cnt2, ch2 = heapq.heappop(max_heap)
result.extend([ch1, ch2])
if cnt1 + 1 < 0: # still has remaining count
heapq.heappush(max_heap, (cnt1 + 1, ch1))
if cnt2 + 1 < 0:
heapq.heappush(max_heap, (cnt2 + 1, ch2))
if max_heap:
cnt, ch = max_heap[0]
if -cnt > 1:
return "" # impossible: one char dominates
result.append(ch)
return "".join(result)
Classic Heap Problems — 10 Must-Know LeetCode Problems
| LC# | Title | Heap Type | Pattern | Time Complexity | Key Insight |
|---|---|---|---|---|---|
| 215 | Kth Largest Element | Min-heap size k | Top-K largest | O(n log k) | Min-heap evicts elements smaller than k-th largest |
| 347 | Top K Frequent | Min-heap size k | Top-K by frequency | O(n log k) | Heap on (freq, element) pairs after counting |
| 23 | Merge K Sorted Lists | Min-heap size k | K-way merge | O(N log k) | Counter tie-breaker prevents ListNode comparison error |
| 295 | Find Median from Stream | Two heaps | Lo/hi partition | O(log n) add, O(1) median | Always insert to lo first, then rebalance to hi |
| 973 | K Closest Points | Max-heap size k | Top-K by distance | O(n log k) | Use squared distance; max-heap evicts farthest |
| 767 | Reorganize String | Max-heap | Greedy frequency | O(n log n) | Always pick two most frequent to avoid adjacency |
| 1046 | Last Stone Weight | Max-heap | Simulation | O(n log n) | Repeat: smash two heaviest, push difference |
| 621 | Task Scheduler | Max-heap + counter | Greedy cooldown | O(n) | Fill cooldown with next-most-frequent tasks |
| 358 | Rearrange String k Distance Apart | Max-heap | Greedy windowed | O(n log n) | Variant of 767 with window size k instead of 2 |
| 632 | Smallest Range Covering K Lists | Min-heap | Sliding window + heap | O(N log k) | Track current max; advance min-list pointer each step |
Common Heap Mistakes in Interviews
1. Using max-heap for K largest (backwards). The counter-intuitive truth: use a min-heap of size K to find K largest. The min-heap root is the K-th largest — anything smaller gets evicted.
2. Forgetting to negate for max-heap in Python. Push -val and negate on pop. For tuples: push (-priority, counter, item). Forgetting negation in a complex solution under time pressure is a common source of wrong answers.
3. Comparing non-comparable objects in tuples. (val, ListNode) will crash when two vals are equal. Always include a monotonic counter: (val, counter, node).
4. Modifying heap entries. You cannot update a value already in the heap in O(log n). If an entry becomes stale (e.g., in Dijkstra's algorithm), use lazy deletion: push the updated entry, and skip stale entries when popping (check if the popped distance > known shortest distance).
5. O(n log n) when O(n log k) is expected. Sorting all n elements and taking K is O(n log n). Maintaining a heap of size K is O(n log k). Interviewers expect the heap solution when k << n.
Interview Questions
Click to reveal answersSign in to take the Quiz
This topic has 20 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.
Sign in to take quiz →