Arrays and Strings: Core Interview Patterns and Execution Strategy
Arrays and strings are the highest-frequency entry point for DSA interviews at Google, Meta, Amazon, and Microsoft. Learn how to classify problems quickly, apply BOTIC from brute force to optimized solutions, and avoid the edge-case failures that cause otherwise-correct candidates to get rejected.
Why Arrays and Strings Matter Most
Arrays and strings are not "easy warm-up topics." They are where interviewers evaluate your core execution signals: pattern recognition speed, communication clarity, edge-case discipline, and complexity derivation under time pressure.
Most first-round coding screens start here because arrays/strings expose whether your process is systematic or ad-hoc. A candidate who cannot clearly reason about index boundaries, duplicates, and invariants in this space is unlikely to handle advanced graph or DP rounds.
High-frequency pattern families in this module:
- Hashing and frequency counting (anagrams, first unique, grouping)
- Two pointers (sorted arrays, partitioning, palindrome checks)
- Sliding window (substring/subarray with constraints)
- Prefix sums and difference logic (range and subarray sums)
The strongest candidates do not memorize 100 solutions. They map each prompt to one of these families quickly, explain why that family fits, and drive BOTIC (Brute force -> Optimize -> Template -> Implement -> Check) with explicit time management.
What Interviewers Evaluate in Array/String Rounds
6/10 (borderline pass): Gets to a working solution but only after trial-and-error coding, weak edge-case testing, and hand-wavy complexity statements.
8/10 (strong pass): Names the right pattern in under 5 minutes, gives a clear brute-force baseline, then derives the optimized invariant before coding.
10/10 (hire-leaning): Runs BOTIC with explicit checkpoints, proactively validates boundary conditions (n=0, duplicates, all-same/all-unique), and explains why complexity is O(n) when there is an inner while loop (amortized pointer movement).
BOTIC for Arrays and Strings
B — Brute Force (3-5 min)
State the straightforward solution first (often nested loops or restart-per-index scan). Derive its complexity explicitly and identify repeated work.
O — Optimize (5-8 min)
Choose the right pattern family: hash map, two pointers, sliding window, or prefix sums. State the invariant before writing code.
T — Template (2-3 min)
Write variable scaffolding first (left/right, counts, result, prefix map). This prevents mid-implementation drift.
I — Implement (10-15 min)
Code incrementally while narrating pointer/state updates. Keep each line tied to the invariant.
C — Check (3-5 min)
Manually trace happy path plus edge cases: empty input, single element, duplicate-heavy input, and tricky boundary cases.
Array/String Pattern Selection Flow
Core Templates You Should Verbalize
Hashing template (frequency / membership):
def has_duplicate(nums: list[int]) -> bool:
seen = set()
for x in nums:
if x in seen:
return True
seen.add(x)
return False
Two-pointer template (sorted pair search):
def two_sum_sorted(nums: list[int], target: int) -> list[int]:
l, r = 0, len(nums) - 1
while l < r:
s = nums[l] + nums[r]
if s == target:
return [l, r]
if s < target:
l += 1
else:
r -= 1
return []
Sliding window template (variable size):
def longest_unique_substring(s: str) -> int:
count = {}
left = 0
best = 0
for right, ch in enumerate(s):
count[ch] = count.get(ch, 0) + 1
while count[ch] > 1:
count[s[left]] -= 1
left += 1
best = max(best, right - left + 1)
return best
Prefix-sum + map template (subarray sum equals k):
def subarray_sum(nums: list[int], k: int) -> int:
seen = {0: 1}
prefix = 0
ans = 0
for x in nums:
prefix += x
ans += seen.get(prefix - k, 0)
seen[prefix] = seen.get(prefix, 0) + 1
return ans
Common Array/String Prompts -> Pattern Mapping
| Prompt signal | Pattern | Invariant | Typical complexity |
|---|---|---|---|
| longest/shortest substring with constraint | Sliding Window | Window is valid after contraction | O(n) |
| pair sum in sorted input | Two Pointers | Eliminate one side each move | O(n) |
| count/group by characters or values | Hash Map | Map tracks exact frequency state | O(n) |
| subarray sum equals k | Prefix Sum + Map | Store prior prefix frequencies | O(n) |
| in-place partition or deduplicate sorted array | Two Pointers | Left side is processed/valid region | O(n) |
| anagram/permutation checks | Frequency Count | All counts net to zero | O(n) |
Two Pointers — Deep Dive and Variants
Two pointers exploit directional structure in sorted or sortable inputs to eliminate search space deterministically. The key insight: sorted order lets you rule out half the remaining candidates with each pointer move.
Three primary variants:
1. Converging pointers (opposite ends): Start left at 0, right at n-1, move based on comparison to target. Classic use: pair sum in sorted array, container with most water, trapping rain water. Invariant: all pairs outside [left, right] have been eliminated.
2. Fast-slow pointers (same direction): Both start at 0, fast advances every iteration, slow advances conditionally. Classic use: remove duplicates in-place, partition arrays (Dutch National Flag), Floyd's cycle detection. Invariant: [0...slow) is the valid processed region.
3. Sliding-width pointers: Both move right, distance between them varies based on validity check. Overlaps with sliding window for contiguous ranges. Classic use: substring problems where window expands/contracts.
When to use two pointers vs alternatives:
- Use over nested loops when sorted order provides directional elimination (reduces O(n²) → O(n))
- Use over hash map when O(1) space is required or when sorted structure is already given
- Avoid when order is semantically meaningless (e.g., grouping anagrams where letter order doesn't map to value order)
Common implementation mistakes:
- Moving both pointers simultaneously when only one should move based on comparison
- Not handling equal-value runs correctly in duplicate-heavy inputs
- Off-by-one errors when pointers can cross vs when they should stop at
left < right - Forgetting to advance slow pointer in fast-slow variants, causing infinite loops
Two Pointers — Converging Variant for Pair Sum
def two_sum_sorted(nums: list[int], target: int) -> list[int]:
"""Find indices of two numbers that sum to target in sorted array.
Invariant: all pairs outside [left, right] have been eliminated.
Complexity: O(n) time, O(1) space.
"""
left, right = 0, len(nums) - 1
while left < right:
current_sum = nums[left] + nums[right]
if current_sum == target:
return [left, right]
elif current_sum < target:
# All pairs with nums[left] are too small, eliminate left
left += 1
else:
# All pairs with nums[right] are too large, eliminate right
right -= 1
return [] # No solution found
# Edge cases: [1, 2], target=4 -> []
# Duplicates: [1, 1, 1, 5], target=6 -> [2, 3]
Two Pointers — Fast-Slow for In-Place Deduplication
def remove_duplicates(nums: list[int]) -> int:
"""Remove duplicates in-place from sorted array, return new length.
Invariant: nums[0:slow] contains unique elements seen so far.
Complexity: O(n) time, O(1) space.
"""
if not nums:
return 0
slow = 1 # Position for next unique element
for fast in range(1, len(nums)):
if nums[fast] != nums[fast - 1]:
nums[slow] = nums[fast]
slow += 1
return slow
# Edge cases: [] -> 0, [1] -> 1, [1,1,1] -> 1
# After: first 'slow' elements are the unique values
Sliding Window — Fixed vs Variable Size
Sliding window optimizes problems involving contiguous subarrays or substrings with constraints. The pattern avoids redundant recomputation by maintaining state as the window slides.
Two core variants:
1. Fixed-size window: Window width k is given. Slide right by 1, add new element, remove leftmost element. Classic use: maximum sum of k consecutive elements, average of subarrays of size k. Complexity: O(n) with O(k) space for state.
2. Variable-size window: Window expands until constraint is violated, then contracts from left until valid again. Classic use: longest substring without repeating characters, minimum window substring, subarrays with sum ≤ k. Complexity: O(n) amortized because each element is added once and removed at most once.
Key requirements for sliding window validity:
- Monotonic contribution: adding elements to the right must change the metric predictably (sum increases, unique count changes, etc.)
- Contractible restoration: removing from left must restore validity when violated
- Contiguity requirement: problem explicitly asks for substring/subarray, not subsequence
When sliding window FAILS:
- Negative numbers in sum-based constraints break monotonicity → use prefix sums instead
- Non-contiguous selections allowed → use DP or greedy instead
- Window validity depends on global state outside window → need different structure
State management patterns:
- Hash map for frequency counts (char counts, element counts)
- Running sum for aggregate metrics
- Max/min tracking with deque for range queries within window
- Boolean flags for constraint satisfaction
Sliding Window — Variable Size for Longest Unique Substring
def length_of_longest_substring(s: str) -> int:
"""Find length of longest substring without repeating characters.
Invariant: current window [left, right] has all unique characters.
Expansion: right moves every iteration.
Contraction: left moves when duplicate detected, until unique restored.
Complexity: O(n) time because total pointer moves ≤ 2n.
"""
char_count = {}
left = 0
max_length = 0
for right, ch in enumerate(s):
char_count[ch] = char_count.get(ch, 0) + 1
# Contract until window is valid (no duplicates)
while char_count[ch] > 1:
char_count[s[left]] -= 1
left += 1
max_length = max(max_length, right - left + 1)
return max_length
# Edge cases: "" -> 0, "aaa" -> 1, "abba" -> 2
# Complexity proof: right visits each index once (n moves),
# left only moves forward and visits each at most once (≤n moves),
# so total work is O(n), not O(n²)
Sliding Window — Fixed Size for Maximum Sum
def max_sum_subarray(nums: list[int], k: int) -> int:
"""Find maximum sum of any contiguous subarray of size k.
Pattern: compute first window sum, then slide by adding new element
and removing leftmost element in O(1) per step.
Complexity: O(n) time, O(1) space.
"""
if len(nums) < k:
return 0
# Compute first window
window_sum = sum(nums[:k])
max_sum = window_sum
# Slide window: add right, remove left
for i in range(k, len(nums)):
window_sum = window_sum + nums[i] - nums[i - k]
max_sum = max(max_sum, window_sum)
return max_sum
# Edge cases: k > len(nums) -> 0, k=1 -> max(nums), all negative -> least negative sum
Prefix Sums and Hashing — Range Query Optimization
Prefix sums transform range-sum queries from O(n) per query to O(1) after O(n) preprocessing. The core identity: sum(i, j) = prefix[j] - prefix[i-1].
When to use prefix sums:
- Multiple range-sum queries on a static array
- Subarray-sum problems where algebraic transformation is possible
- 2D matrix range queries (2D prefix sums)
Prefix sum + hash map pattern (subarray sum equals k):
Key insight: if prefix[j] - prefix[i] = k, then subarray [i+1...j] has sum k. Rearranging: prefix[j] - k = prefix[i]. So for each position j, check if prefix[j] - k has been seen before.
Why this pattern is powerful:
- Works with negative numbers (sliding window fails here)
- Counts all valid subarrays, not just one
- Handles duplicate prefix sums correctly via frequency map
Implementation subtleties:
- Initialize map with
{0: 1}to handle subarrays starting at index 0 - Update answer before incrementing current prefix count (avoids zero-length subarrays)
- Map stores prefix → frequency, not prefix → index (for counting, not finding positions)
Failure modes to avoid:
- Reversing update order causes double-counting when k=0
- Forgetting {0: 1} initialization misses prefix[j]=k cases
- Confusing this with sliding window (prefix+map works with negatives, sliding window doesn't)
Extensions:
- Subarray sum divisible by k: use
prefix % kas key - Count subarrays with sum in range [L, R]:
count(≤R) - count(≤L-1) - 2D matrix subarray sum: combine 1D prefix sums with column compression
Prefix Sum + Hash Map for Subarray Sum Equals K
def subarray_sum(nums: list[int], k: int) -> int:
"""Count subarrays with sum equal to k (works with negatives).
Identity: prefix[j] - prefix[i] = k means subarray [i+1...j] sums to k.
Rearranged: need to find prior prefix[i] = prefix[j] - k.
Map tracks: prefix_value -> count of how many times we've seen it.
Complexity: O(n) time, O(n) space worst case.
"""
prefix_count = {0: 1} # Base case: prefix 0 before any elements
prefix_sum = 0
total_count = 0
for num in nums:
prefix_sum += num
# Check if (prefix_sum - k) exists as a prior prefix
total_count += prefix_count.get(prefix_sum - k, 0)
# Record current prefix AFTER checking (avoids zero-length)
prefix_count[prefix_sum] = prefix_count.get(prefix_sum, 0) + 1
return total_count
# Example: nums=[1,-1,0], k=0
# prefix: 0 -> 1 -> 0 -> 0
# Count: prefix=1 (need 1, none yet), prefix=0 (need 0, count 1),
# prefix=0 again (need 0, now count 2 because prefix 0 seen twice)
# Valid subarrays: [1,-1], [0], [1,-1,0] -> 3 total
Common Interview Problems — Worked Examples
The following problems represent the highest-frequency array/string patterns at FAANG. For each, we show: pattern classification, invariant, implementation approach, and edge-case discipline.
Problem selection rationale: these cover all four core patterns (two pointers, sliding window, hashing, prefix sums) and expose the most common failure modes (off-by-one, incorrect complexity reasoning, missing edge cases).
How to use these in practice: before each interview, pick 2-3 from different pattern families and verbalize the full BOTIC walkthrough including complexity proof and edge-case testing. This builds the muscle memory for interview conditions.
Problem 1: Container With Most Water
Prompt: Given array of heights, find two lines that together with x-axis form a container with maximum water.
Pattern: Two pointers (converging from ends).
Brute force: Try all pairs O(n²).
Optimization insight: Area is limited by shorter line. If we move the pointer at the shorter line inward, we might find a taller line that increases area. Moving the taller line pointer can only decrease area (width decreases, height can't improve).
Invariant: All pairs outside [left, right] have been eliminated as suboptimal.
Implementation:
def max_area(height: list[int]) -> int:
left, right = 0, len(height) - 1
max_water = 0
while left < right:
width = right - left
current_area = width * min(height[left], height[right])
max_water = max(max_water, current_area)
# Move pointer at shorter line
if height[left] < height[right]:
left += 1
else:
right -= 1
return max_water
Complexity: O(n) time, O(1) space. Each pointer moves at most n times total.
Edge cases: two bars only → compute directly, all same height → any pair valid, extreme heights [1, 100, 1] → pointer logic handles correctly.
Problem 2: Minimum Window Substring
Prompt: Given strings s and t, find minimum window in s containing all characters of t.
Pattern: Sliding window (variable size) + frequency map.
Brute force: Check every substring O(n³) or O(n²) with frequency comparison.
Optimization insight: Expand window right to include characters, contract from left when valid to minimize length.
Invariant: Track counts of required characters; window is valid when all required counts are met.
Implementation sketch:
def min_window(s: str, t: str) -> str:
if not t or not s:
return ""
need = {}
for ch in t:
need[ch] = need.get(ch, 0) + 1
have = {}
required = len(need)
formed = 0
left = 0
min_len = float('inf')
min_left = 0
for right, ch in enumerate(s):
have[ch] = have.get(ch, 0) + 1
if ch in need and have[ch] == need[ch]:
formed += 1
# Contract while valid
while formed == required:
# Update result
if right - left + 1 < min_len:
min_len = right - left + 1
min_left = left
# Remove left character
have[s[left]] -= 1
if s[left] in need and have[s[left]] < need[s[left]]:
formed -= 1
left += 1
return "" if min_len == float('inf') else s[min_left:min_left + min_len]
Complexity: O(|s| + |t|) time, O(|s| + |t|) space for maps.
Edge cases: t longer than s → "", no valid window → "", multiple valid windows → return shortest, s=t → return s.
Problem 3: Product of Array Except Self
Prompt: Return array where output[i] equals product of all elements except nums[i], without division, in O(n).
Pattern: Prefix products from left and right.
Brute force: For each index, multiply all others O(n²).
Optimization insight: Product except self = (product of all elements to the left) × (product of all elements to the right). Compute left products in one pass, right products in second pass.
Implementation:
def product_except_self(nums: list[int]) -> list[int]:
n = len(nums)
result = [1] * n
# Left pass: result[i] = product of all elements to the left
left_product = 1
for i in range(n):
result[i] = left_product
left_product *= nums[i]
# Right pass: multiply by product of all elements to the right
right_product = 1
for i in range(n - 1, -1, -1):
result[i] *= right_product
right_product *= nums[i]
return result
Complexity: O(n) time, O(1) extra space (output array doesn't count).
Edge cases: single element → [1], contains zero → correct because zero is in prefix/suffix, all ones → all ones, negative numbers → handled correctly by multiplication.
Complexity Analysis — Amortized and Worst-Case Reasoning
Why complexity reasoning matters in interviews:
Interviewers distinguish between candidates who state "O(n) because single loop" vs candidates who derive complexity rigorously. The latter demonstrates analytical depth and transferable problem-solving skills.
Amortized analysis for sliding window:
Common confusion: variable-size sliding window has a for loop with a nested while loop. Looks like O(n²), but is actually O(n). Why?
Accounting argument: Right pointer advances exactly n times (outer loop). Left pointer only moves forward, never resets. Each element is added to window once and removed at most once. Total pointer movements: ≤ 2n. Each movement does O(1) work (hash map update). Therefore total O(n).
What to say in interviews: "Right pointer visits each index once for n total moves. Left pointer also moves only forward and visits each index at most once, so at most n moves total. Combined 2n pointer movements with O(1) work per movement gives O(n) overall."
Prefix sum + map complexity:
Single pass builds prefix sums and populates map. Each insertion and lookup is O(1) average. Total O(n) time. Space is O(n) worst case if all prefix sums are unique.
Hashing complexity subtleties:
Hash map operations are O(1) average case, but O(n) worst case with poor hash function or adversarial input. In interview settings, assume average case unless interviewer asks about worst case. Then mention: "In practice O(1) average with good hash functions; worst-case O(n) with hash collisions, but Python's dict uses robust hashing."
Two-pointer complexity:
Most two-pointer solutions are O(n) because each pointer moves at most n times and doesn't reset. Exception: nested two-pointer variants (e.g., 3Sum with outer loop and inner two-pointer) are O(n²) because of the outer loop multiplier.
Space complexity accounting:
- Input-only structures don't count toward extra space
- Output array often doesn't count (problem specifies)
- Hash maps: count number of unique keys stored (worst case O(n), often O(k) for k unique elements)
- Prefix arrays: O(n) if stored explicitly, O(1) if computed on-the-fly
Pattern Complexity Comparison
| Pattern | Time | Space | When O(1) space achievable | Amortized vs worst-case |
|---|---|---|---|---|
| Two Pointers | O(n) | O(1) | Always (just pointer variables) | Same (pointers move linearly) |
| Fixed Sliding Window | O(n) | O(k) | If k=O(1) constraint tracking | Same (each element processed once) |
| Variable Sliding Window | O(n) | O(k) | If k=O(1) unique elements | Amortized O(n), worst single iteration O(n) |
| Prefix Sum + Map | O(n) | O(n) | Not possible (map stores prefixes) | Same (each operation O(1) average) |
| Hashing (frequency) | O(n) | O(k) | If k=O(1) unique values | O(1) per operation average, O(n) worst |
Failure Modes That Cause Rejection
1. Confusing substring with subsequence. Substring is contiguous; subsequence is not. This changes the pattern family completely.
2. Off-by-one window length. For inclusive [left, right], size is right - left + 1, not right - left.
3. Incorrect duplicate boundary handling. In index-jump sliding window variants, move left only when last occurrence is inside current window.
4. Weak complexity reasoning. Saying "single pass so O(n)" without amortized explanation is not enough when an inner while loop exists.
5. No edge-case protocol. Candidates who never test empty, single-element, and all-duplicate inputs are treated as unreliable under production pressure.
Interview Script for This Module
Use this structure out loud: "I will start with a brute-force baseline, identify repeated work, then switch to the optimal pattern with an invariant. After coding, I'll trace one example and then run edge cases."
For complexity, say: "Right pointer moves n times, left pointer moves at most n times total, so total pointer moves are <= 2n, which is O(n)." That single sentence often differentiates senior-caliber communication.
Interview Questions
Click to reveal answersSign in to take the Quiz
This topic has 20 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.
Sign in to take quiz →