Sections

Arrays and strings are the highest-frequency entry point for DSA interviews at Google, Meta, Amazon, and Microsoft. Learn how to classify problems quickly, apply BOTIC from brute force to optimized solutions, and avoid the edge-case failures that cause otherwise-correct candidates to get rejected.

23 min read 22 sections 6 interview questions

ArraysStringsHash MapSliding WindowTwo PointersPrefix SumFrequency CountLeetCodeInterview StrategyComplexity AnalysisEdge CasesBOTIC

Why Arrays and Strings Matter Most

Arrays and strings are not "easy warm-up topics." They are where interviewers evaluate your core execution signals: pattern recognition speed, communication clarity, edge-case discipline, and complexity derivation under time pressure.

Most first-round coding screens start here because arrays/strings expose whether your process is systematic or ad-hoc. A candidate who cannot clearly reason about index boundaries, duplicates, and invariants in this space is unlikely to handle advanced graph or DP rounds.

High-frequency pattern families in this module:

Hashing and frequency counting (anagrams, first unique, grouping)
Two pointers (sorted arrays, partitioning, palindrome checks)
Sliding window (substring/subarray with constraints)
Prefix sums and difference logic (range and subarray sums)

The strongest candidates do not memorize 100 solutions. They map each prompt to one of these families quickly, explain why that family fits, and drive BOTIC (Brute force -> Optimize -> Template -> Implement -> Check) with explicit time management.

IMPORTANT

What Interviewers Evaluate in Array/String Rounds

6/10 (borderline pass): Gets to a working solution but only after trial-and-error coding, weak edge-case testing, and hand-wavy complexity statements.

8/10 (strong pass): Names the right pattern in under 5 minutes, gives a clear brute-force baseline, then derives the optimized invariant before coding.

10/10 (hire-leaning): Runs BOTIC with explicit checkpoints, proactively validates boundary conditions (n=0, duplicates, all-same/all-unique), and explains why complexity is O(n) when there is an inner while loop (amortized pointer movement).

BOTIC for Arrays and Strings

B — Brute Force (3-5 min)

State the straightforward solution first (often nested loops or restart-per-index scan). Derive its complexity explicitly and identify repeated work.

O — Optimize (5-8 min)

Choose the right pattern family: hash map, two pointers, sliding window, or prefix sums. State the invariant before writing code.

T — Template (2-3 min)

Write variable scaffolding first (left/right, counts, result, prefix map). This prevents mid-implementation drift.

I — Implement (10-15 min)

Code incrementally while narrating pointer/state updates. Keep each line tied to the invariant.

C — Check (3-5 min)

Manually trace happy path plus edge cases: empty input, single element, duplicate-heavy input, and tricky boundary cases.

Array/String Pattern Selection Flow

Rendering diagram...

Core Templates You Should Verbalize

Hashing template (frequency / membership):

def has_duplicate(nums: list[int]) -> bool:
    seen = set()
    for x in nums:
        if x in seen:
            return True
        seen.add(x)
    return False

Two-pointer template (sorted pair search):

def two_sum_sorted(nums: list[int], target: int) -> list[int]:
    l, r = 0, len(nums) - 1
    while l < r:
        s = nums[l] + nums[r]
        if s == target:
            return [l, r]
        if s < target:
            l += 1
        else:
            r -= 1
    return []

Sliding window template (variable size):

def longest_unique_substring(s: str) -> int:
    count = {}
    left = 0
    best = 0
    for right, ch in enumerate(s):
        count[ch] = count.get(ch, 0) + 1
        while count[ch] > 1:
            count[s[left]] -= 1
            left += 1
        best = max(best, right - left + 1)
    return best

Prefix-sum + map template (subarray sum equals k):

def subarray_sum(nums: list[int], k: int) -> int:
    seen = {0: 1}
    prefix = 0
    ans = 0
    for x in nums:
        prefix += x
        ans += seen.get(prefix - k, 0)
        seen[prefix] = seen.get(prefix, 0) + 1
    return ans

Common Array/String Prompts -> Pattern Mapping

Prompt signal	Pattern	Invariant	Typical complexity
longest/shortest substring with constraint	Sliding Window	Window is valid after contraction	O(n)
pair sum in sorted input	Two Pointers	Eliminate one side each move	O(n)
count/group by characters or values	Hash Map	Map tracks exact frequency state	O(n)
subarray sum equals k	Prefix Sum + Map	Store prior prefix frequencies	O(n)
in-place partition or deduplicate sorted array	Two Pointers	Left side is processed/valid region	O(n)
anagram/permutation checks	Frequency Count	All counts net to zero	O(n)

Two Pointers — Deep Dive and Variants

Two pointers exploit directional structure in sorted or sortable inputs to eliminate search space deterministically. The key insight: sorted order lets you rule out half the remaining candidates with each pointer move.

Three primary variants:

1. Converging pointers (opposite ends): Start left at 0, right at n-1, move based on comparison to target. Classic use: pair sum in sorted array, container with most water, trapping rain water. Invariant: all pairs outside [left, right] have been eliminated.

2. Fast-slow pointers (same direction): Both start at 0, fast advances every iteration, slow advances conditionally. Classic use: remove duplicates in-place, partition arrays (Dutch National Flag), Floyd's cycle detection. Invariant: [0...slow) is the valid processed region.

3. Sliding-width pointers: Both move right, distance between them varies based on validity check. Overlaps with sliding window for contiguous ranges. Classic use: substring problems where window expands/contracts.

When to use two pointers vs alternatives:

Use over nested loops when sorted order provides directional elimination (reduces O(n²) → O(n))
Use over hash map when O(1) space is required or when sorted structure is already given
Avoid when order is semantically meaningless (e.g., grouping anagrams where letter order doesn't map to value order)

Common implementation mistakes:

Moving both pointers simultaneously when only one should move based on comparison
Not handling equal-value runs correctly in duplicate-heavy inputs
Off-by-one errors when pointers can cross vs when they should stop at left < right
Forgetting to advance slow pointer in fast-slow variants, causing infinite loops

Two Pointers — Converging Variant for Pair Sum

pythontwo_sum_sorted.py

def two_sum_sorted(nums: list[int], target: int) -> list[int]:
    """Find indices of two numbers that sum to target in sorted array.
    
    Invariant: all pairs outside [left, right] have been eliminated.
    Complexity: O(n) time, O(1) space.
    """
    left, right = 0, len(nums) - 1
    
    while left < right:
        current_sum = nums[left] + nums[right]
        
        if current_sum == target:
            return [left, right]
        elif current_sum < target:
            # All pairs with nums[left] are too small, eliminate left
            left += 1
        else:
            # All pairs with nums[right] are too large, eliminate right
            right -= 1
    
    return []  # No solution found

# Edge cases: [1, 2], target=4 -> []
# Duplicates: [1, 1, 1, 5], target=6 -> [2, 3]

Two Pointers — Fast-Slow for In-Place Deduplication

pythonremove_duplicates.py

def remove_duplicates(nums: list[int]) -> int:
    """Remove duplicates in-place from sorted array, return new length.
    
    Invariant: nums[0:slow] contains unique elements seen so far.
    Complexity: O(n) time, O(1) space.
    """
    if not nums:
        return 0
    
    slow = 1  # Position for next unique element
    
    for fast in range(1, len(nums)):
        if nums[fast] != nums[fast - 1]:
            nums[slow] = nums[fast]
            slow += 1
    
    return slow

# Edge cases: [] -> 0, [1] -> 1, [1,1,1] -> 1
# After: first 'slow' elements are the unique values

Sliding Window — Fixed vs Variable Size

Sliding window optimizes problems involving contiguous subarrays or substrings with constraints. The pattern avoids redundant recomputation by maintaining state as the window slides.

Two core variants:

1. Fixed-size window: Window width k is given. Slide right by 1, add new element, remove leftmost element. Classic use: maximum sum of k consecutive elements, average of subarrays of size k. Complexity: O(n) with O(k) space for state.

2. Variable-size window: Window expands until constraint is violated, then contracts from left until valid again. Classic use: longest substring without repeating characters, minimum window substring, subarrays with sum ≤ k. Complexity: O(n) amortized because each element is added once and removed at most once.

Key requirements for sliding window validity:

Monotonic contribution: adding elements to the right must change the metric predictably (sum increases, unique count changes, etc.)
Contractible restoration: removing from left must restore validity when violated
Contiguity requirement: problem explicitly asks for substring/subarray, not subsequence

When sliding window FAILS:

Negative numbers in sum-based constraints break monotonicity → use prefix sums instead
Non-contiguous selections allowed → use DP or greedy instead
Window validity depends on global state outside window → need different structure

State management patterns:

Hash map for frequency counts (char counts, element counts)
Running sum for aggregate metrics
Max/min tracking with deque for range queries within window
Boolean flags for constraint satisfaction

Sliding Window — Variable Size for Longest Unique Substring

pythonlongest_unique.py

def length_of_longest_substring(s: str) -> int:
    """Find length of longest substring without repeating characters.
    
    Invariant: current window [left, right] has all unique characters.
    Expansion: right moves every iteration.
    Contraction: left moves when duplicate detected, until unique restored.
    Complexity: O(n) time because total pointer moves ≤ 2n.
    """
    char_count = {}
    left = 0
    max_length = 0
    
    for right, ch in enumerate(s):
        char_count[ch] = char_count.get(ch, 0) + 1
        
        # Contract until window is valid (no duplicates)
        while char_count[ch] > 1:
            char_count[s[left]] -= 1
            left += 1
        
        max_length = max(max_length, right - left + 1)
    
    return max_length

# Edge cases: "" -> 0, "aaa" -> 1, "abba" -> 2
# Complexity proof: right visits each index once (n moves),
# left only moves forward and visits each at most once (≤n moves),
# so total work is O(n), not O(n²)

Sliding Window — Fixed Size for Maximum Sum

pythonmax_sum_fixed.py

def max_sum_subarray(nums: list[int], k: int) -> int:
    """Find maximum sum of any contiguous subarray of size k.
    
    Pattern: compute first window sum, then slide by adding new element
    and removing leftmost element in O(1) per step.
    Complexity: O(n) time, O(1) space.
    """
    if len(nums) < k:
        return 0
    
    # Compute first window
    window_sum = sum(nums[:k])
    max_sum = window_sum
    
    # Slide window: add right, remove left
    for i in range(k, len(nums)):
        window_sum = window_sum + nums[i] - nums[i - k]
        max_sum = max(max_sum, window_sum)
    
    return max_sum

# Edge cases: k > len(nums) -> 0, k=1 -> max(nums), all negative -> least negative sum

Prefix Sums and Hashing — Range Query Optimization

Prefix sums transform range-sum queries from O(n) per query to O(1) after O(n) preprocessing. The core identity: sum(i, j) = prefix[j] - prefix[i-1].

When to use prefix sums:

Multiple range-sum queries on a static array
Subarray-sum problems where algebraic transformation is possible
2D matrix range queries (2D prefix sums)

Prefix sum + hash map pattern (subarray sum equals k):

Key insight: if prefix[j] - prefix[i] = k, then subarray [i+1...j] has sum k. Rearranging: prefix[j] - k = prefix[i]. So for each position j, check if prefix[j] - k has been seen before.

Why this pattern is powerful:

Works with negative numbers (sliding window fails here)
Counts all valid subarrays, not just one
Handles duplicate prefix sums correctly via frequency map

Implementation subtleties:

Initialize map with {0: 1} to handle subarrays starting at index 0
Update answer before incrementing current prefix count (avoids zero-length subarrays)
Map stores prefix → frequency, not prefix → index (for counting, not finding positions)

Failure modes to avoid:

Reversing update order causes double-counting when k=0
Forgetting {0: 1} initialization misses prefix[j]=k cases
Confusing this with sliding window (prefix+map works with negatives, sliding window doesn't)

Extensions:

Subarray sum divisible by k: use prefix % k as key
Count subarrays with sum in range [L, R]: count(≤R) - count(≤L-1)
2D matrix subarray sum: combine 1D prefix sums with column compression

Prefix Sum + Hash Map for Subarray Sum Equals K

pythonsubarray_sum.py

def subarray_sum(nums: list[int], k: int) -> int:
    """Count subarrays with sum equal to k (works with negatives).
    
    Identity: prefix[j] - prefix[i] = k means subarray [i+1...j] sums to k.
    Rearranged: need to find prior prefix[i] = prefix[j] - k.
    Map tracks: prefix_value -> count of how many times we've seen it.
    
    Complexity: O(n) time, O(n) space worst case.
    """
    prefix_count = {0: 1}  # Base case: prefix 0 before any elements
    prefix_sum = 0
    total_count = 0
    
    for num in nums:
        prefix_sum += num
        
        # Check if (prefix_sum - k) exists as a prior prefix
        total_count += prefix_count.get(prefix_sum - k, 0)
        
        # Record current prefix AFTER checking (avoids zero-length)
        prefix_count[prefix_sum] = prefix_count.get(prefix_sum, 0) + 1
    
    return total_count

# Example: nums=[1,-1,0], k=0
# prefix: 0 -> 1 -> 0 -> 0
# Count: prefix=1 (need 1, none yet), prefix=0 (need 0, count 1),
#        prefix=0 again (need 0, now count 2 because prefix 0 seen twice)
# Valid subarrays: [1,-1], [0], [1,-1,0] -> 3 total

Common Interview Problems — Worked Examples

The following problems represent the highest-frequency array/string patterns at FAANG. For each, we show: pattern classification, invariant, implementation approach, and edge-case discipline.

Problem selection rationale: these cover all four core patterns (two pointers, sliding window, hashing, prefix sums) and expose the most common failure modes (off-by-one, incorrect complexity reasoning, missing edge cases).

How to use these in practice: before each interview, pick 2-3 from different pattern families and verbalize the full BOTIC walkthrough including complexity proof and edge-case testing. This builds the muscle memory for interview conditions.

EXAMPLE

Problem 1: Container With Most Water

Prompt: Given array of heights, find two lines that together with x-axis form a container with maximum water.

Pattern: Two pointers (converging from ends).

Brute force: Try all pairs O(n²).

Optimization insight: Area is limited by shorter line. If we move the pointer at the shorter line inward, we might find a taller line that increases area. Moving the taller line pointer can only decrease area (width decreases, height can't improve).

Invariant: All pairs outside [left, right] have been eliminated as suboptimal.

Implementation:

def max_area(height: list[int]) -> int:
    left, right = 0, len(height) - 1
    max_water = 0
    
    while left < right:
        width = right - left
        current_area = width * min(height[left], height[right])
        max_water = max(max_water, current_area)
        
        # Move pointer at shorter line
        if height[left] < height[right]:
            left += 1
        else:
            right -= 1
    
    return max_water

Complexity: O(n) time, O(1) space. Each pointer moves at most n times total.

Edge cases: two bars only → compute directly, all same height → any pair valid, extreme heights [1, 100, 1] → pointer logic handles correctly.

EXAMPLE

Problem 2: Minimum Window Substring

Prompt: Given strings s and t, find minimum window in s containing all characters of t.

Pattern: Sliding window (variable size) + frequency map.

Brute force: Check every substring O(n³) or O(n²) with frequency comparison.

Optimization insight: Expand window right to include characters, contract from left when valid to minimize length.

Invariant: Track counts of required characters; window is valid when all required counts are met.

Implementation sketch:

def min_window(s: str, t: str) -> str:
    if not t or not s:
        return ""
    
    need = {}
    for ch in t:
        need[ch] = need.get(ch, 0) + 1
    
    have = {}
    required = len(need)
    formed = 0
    left = 0
    min_len = float('inf')
    min_left = 0
    
    for right, ch in enumerate(s):
        have[ch] = have.get(ch, 0) + 1
        
        if ch in need and have[ch] == need[ch]:
            formed += 1
        
        # Contract while valid
        while formed == required:
            # Update result
            if right - left + 1 < min_len:
                min_len = right - left + 1
                min_left = left
            
            # Remove left character
            have[s[left]] -= 1
            if s[left] in need and have[s[left]] < need[s[left]]:
                formed -= 1
            left += 1
    
    return "" if min_len == float('inf') else s[min_left:min_left + min_len]

Complexity: O(|s| + |t|) time, O(|s| + |t|) space for maps.

Edge cases: t longer than s → "", no valid window → "", multiple valid windows → return shortest, s=t → return s.

EXAMPLE

Problem 3: Product of Array Except Self

Prompt: Return array where output[i] equals product of all elements except nums[i], without division, in O(n).

Pattern: Prefix products from left and right.

Brute force: For each index, multiply all others O(n²).

Optimization insight: Product except self = (product of all elements to the left) × (product of all elements to the right). Compute left products in one pass, right products in second pass.

Implementation:

def product_except_self(nums: list[int]) -> list[int]:
    n = len(nums)
    result = [1] * n
    
    # Left pass: result[i] = product of all elements to the left
    left_product = 1
    for i in range(n):
        result[i] = left_product
        left_product *= nums[i]
    
    # Right pass: multiply by product of all elements to the right
    right_product = 1
    for i in range(n - 1, -1, -1):
        result[i] *= right_product
        right_product *= nums[i]
    
    return result

Complexity: O(n) time, O(1) extra space (output array doesn't count).

Edge cases: single element → [1], contains zero → correct because zero is in prefix/suffix, all ones → all ones, negative numbers → handled correctly by multiplication.

Complexity Analysis — Amortized and Worst-Case Reasoning

Why complexity reasoning matters in interviews:

Interviewers distinguish between candidates who state "O(n) because single loop" vs candidates who derive complexity rigorously. The latter demonstrates analytical depth and transferable problem-solving skills.

Amortized analysis for sliding window:

Common confusion: variable-size sliding window has a for loop with a nested while loop. Looks like O(n²), but is actually O(n). Why?

Accounting argument: Right pointer advances exactly n times (outer loop). Left pointer only moves forward, never resets. Each element is added to window once and removed at most once. Total pointer movements: ≤ 2n. Each movement does O(1) work (hash map update). Therefore total O(n).

What to say in interviews: "Right pointer visits each index once for n total moves. Left pointer also moves only forward and visits each index at most once, so at most n moves total. Combined 2n pointer movements with O(1) work per movement gives O(n) overall."

Prefix sum + map complexity:

Single pass builds prefix sums and populates map. Each insertion and lookup is O(1) average. Total O(n) time. Space is O(n) worst case if all prefix sums are unique.

Hashing complexity subtleties:

Hash map operations are O(1) average case, but O(n) worst case with poor hash function or adversarial input. In interview settings, assume average case unless interviewer asks about worst case. Then mention: "In practice O(1) average with good hash functions; worst-case O(n) with hash collisions, but Python's dict uses robust hashing."

Two-pointer complexity:

Most two-pointer solutions are O(n) because each pointer moves at most n times and doesn't reset. Exception: nested two-pointer variants (e.g., 3Sum with outer loop and inner two-pointer) are O(n²) because of the outer loop multiplier.

Space complexity accounting:

Input-only structures don't count toward extra space
Output array often doesn't count (problem specifies)
Hash maps: count number of unique keys stored (worst case O(n), often O(k) for k unique elements)
Prefix arrays: O(n) if stored explicitly, O(1) if computed on-the-fly

Pattern Complexity Comparison

Pattern	Time	Space	When O(1) space achievable	Amortized vs worst-case
Two Pointers	O(n)	O(1)	Always (just pointer variables)	Same (pointers move linearly)
Fixed Sliding Window	O(n)	O(k)	If k=O(1) constraint tracking	Same (each element processed once)
Variable Sliding Window	O(n)	O(k)	If k=O(1) unique elements	Amortized O(n), worst single iteration O(n)
Prefix Sum + Map	O(n)	O(n)	Not possible (map stores prefixes)	Same (each operation O(1) average)
Hashing (frequency)	O(n)	O(k)	If k=O(1) unique values	O(1) per operation average, O(n) worst

⚠ WARNING

Failure Modes That Cause Rejection

1. Confusing substring with subsequence. Substring is contiguous; subsequence is not. This changes the pattern family completely.

2. Off-by-one window length. For inclusive [left, right], size is right - left + 1, not right - left.

3. Incorrect duplicate boundary handling. In index-jump sliding window variants, move left only when last occurrence is inside current window.

4. Weak complexity reasoning. Saying "single pass so O(n)" without amortized explanation is not enough when an inner while loop exists.

5. No edge-case protocol. Candidates who never test empty, single-element, and all-duplicate inputs are treated as unreliable under production pressure.

TIP

Interview Script for This Module

Use this structure out loud: "I will start with a brute-force baseline, identify repeated work, then switch to the optimal pattern with an invariant. After coding, I'll trace one example and then run edge cases."

For complexity, say: "Right pointer moves n times, left pointer moves at most n times total, so total pointer moves are <= 2n, which is O(n)." That single sentence often differentiates senior-caliber communication.

Interview Questions

Click to reveal answers

Test your knowledge

Sign in to take the Quiz

This topic has 20 quiz questions with instant feedback and detailed explanations. Sign in to unlock quizzes.