CS Fundamentals
A complete set of senior-level computer science fundamentals questions covering complexity analysis, arrays and strings, linked lists, trees, heaps, graphs, sorting, dynamic programming, greedy algorithms, and advanced data structures used in system design.
Complexity Analysis
10 questionsAsymptotic notation describes how an algorithm's resource usage (time or space) scales as the input size grows, ignoring constant factors and lower-order terms.
O (Big-O) โ upper bound (worst case): f(n) = O(g(n)) means f grows no faster than g. The ceiling. "My algorithm takes at most this long."
ฮฉ (Big-Omega) โ lower bound (best case): f(n) = ฮฉ(g(n)) means f grows at least as fast as g. The floor. "My algorithm takes at least this long."
ฮ (Big-Theta) โ tight bound: f(n) = ฮ(g(n)) means f grows at exactly the same rate as g โ both O and ฮฉ. "My algorithm takes exactly this long, up to constants."
# Common complexities ranked (fastest to slowest):
# O(1) โ constant: hash table lookup
# O(log n) โ logarithmic: binary search
# O(n) โ linear: linear scan
# O(n log n) โ linearithmic: merge sort, heap sort
# O(nยฒ) โ quadratic: nested loops, bubble sort
# O(nยณ) โ cubic: naive matrix multiply
# O(2โฟ) โ exponential: all subsets
# O(n!) โ factorial: all permutations
# Simplification rules:
# Drop constants: O(3n) โ O(n)
# Drop lower terms: O(nยฒ + n) โ O(nยฒ)
# Different inputs: O(a + b) stays O(a + b) โ don't collapse different variables
In practice: Interviewers say "Big-O" but usually mean worst-case (which is O, not ฮ). When asked for complexity, state both time and space, and clarify whether it's worst/average/best case if it matters (e.g., quicksort is O(nยฒ) worst but O(n log n) average).
Amortized analysis measures the average cost per operation over a sequence of operations, even when individual operations can be expensive. It's useful when occasional expensive operations are offset by many cheap ones.
Dynamic array (Python list / Java ArrayList) append: Most appends are O(1). When the array is full, it doubles in size โ an O(n) copy. But because doubling happens rarely, the amortized cost per append is still O(1).
# How amortized O(1) works for dynamic array:
# Start capacity = 1. After n appends, total copies made:
# 1 + 2 + 4 + 8 + ... + n/2 = n - 1 copies
# Total work across n appends: n (appends) + n-1 (copies) โ 2n
# Amortized cost per append: 2n / n = O(1)
# Intuition: each element is "charged" O(1) for itself
# and pre-pays O(1) to cover the cost of one future copy.
# When doubling happens, each element pays for its own copy.
# Key insight: doubling (not adding fixed amount) is crucial.
# If we grew by +1 each time: 1+2+3+...+n = O(nยฒ) total โ O(n) per append
# Doubling: always O(1) amortized
Other amortized O(1) examples: stack push/pop, hash table insert (with rehashing), deque push/pop. O(log n) amortized: splay tree operations, Fibonacci heap decrease-key. Amortized analysis matters for interview discussions of data structure design โ saying "worst case O(n) but amortized O(1)" shows depth.
Space complexity measures the amount of memory an algorithm uses relative to input size. There are two ways to measure it:
Total space: includes the input itself plus any auxiliary allocations.
Auxiliary space: only the extra memory beyond the input. Usually the more useful metric โ it captures the overhead the algorithm adds.
# Examples:
# Merge sort:
# Auxiliary space: O(n) โ needs a temporary array for merging
# Total space: O(n) โ input + temp array
# Quicksort (in-place):
# Auxiliary space: O(log n) โ recursion stack depth (average)
# Total space: O(n) โ input dominates
# Recursive Fibonacci:
# Auxiliary space: O(n) โ call stack n levels deep
# (even though no explicit data structure is allocated)
# Hash map for frequency counting:
def count_chars(s): # O(n) time
counts = {} # O(k) auxiliary space, k = unique chars
for c in s: # For bounded alphabet (26 letters): O(1) space!
counts[c] = counts.get(c, 0) + 1
return counts
# Recursive vs iterative space:
def fib_recursive(n): # O(n) space โ call stack
if n <= 1: return n
return fib_recursive(n-1) + fib_recursive(n-2)
def fib_iterative(n): # O(1) auxiliary space
a, b = 0, 1
for _ in range(n): a, b = b, a + b
return a
Call stack counts: Don't forget recursive calls consume stack space. A function that recurses n levels deep uses O(n) auxiliary space even if no other data structures are used. Tail-call optimization (TCO) can reduce this to O(1) but most languages/runtimes don't guarantee it.
For divide-and-conquer algorithms with recurrence of the form T(n) = aยทT(n/b) + f(n) โ split into a subproblems of size n/b with f(n) work at each level:
Master Theorem (three cases):
# T(n) = aยทT(n/b) + f(n) where a >= 1, b > 1
# Compare f(n) to n^(log_b(a)) โ the "watershed" function
# Case 1: f(n) = O(n^(log_b(a) - ฮต)) [work dominated by subproblems]
# โ T(n) = ฮ(n^log_b(a))
# Case 2: f(n) = ฮ(n^log_b(a)) [work balanced at every level]
# โ T(n) = ฮ(n^log_b(a) ยท log n)
# Case 3: f(n) = ฮฉ(n^(log_b(a) + ฮต)) [work dominated by root]
# โ T(n) = ฮ(f(n))
# Classic examples:
# Merge sort: T(n) = 2T(n/2) + O(n)
# a=2, b=2, log_2(2)=1, f(n)=n = ฮ(n^1) โ Case 2 โ O(n log n)
# Binary search: T(n) = T(n/2) + O(1)
# a=1, b=2, log_2(1)=0, f(n)=1 = ฮ(n^0) โ Case 2 โ O(log n)
# Karatsuba: T(n) = 3T(n/2) + O(n)
# a=3, b=2, log_2(3)โ1.58, f(n)=n < n^1.58 โ Case 1 โ O(n^1.58)
# Naive matrix multiply: T(n) = 8T(n/2) + O(nยฒ)
# a=8, b=2, log_2(8)=3, f(n)=nยฒ < n^3 โ Case 1 โ O(nยณ)
# Strassen: T(n) = 7T(n/2) + O(nยฒ)
# log_2(7)โ2.81, โ Case 1 โ O(n^2.81) โ faster than naive!
For recurrences that don't fit the master theorem (e.g., T(n) = T(n-1) + O(1)), use substitution or draw the recursion tree and sum the work at each level.
Many candidates miss hidden costs in operations that look constant.
# TRAP 1: String concatenation in a loop
result = ""
for word in words: # looks O(n), actually O(nยฒ)
result += word # each += creates a NEW string, copies all chars
# Fix: "".join(words) โ O(n)
# TRAP 2: Slicing creates copies
arr = [1, 2, 3, 4, 5]
sub = arr[1:] # O(n) โ new list, not O(1)!
# In Python, numpy slices ARE O(1) (views, not copies)
# TRAP 3: "in" operator on lists vs sets
x in my_list # O(n) linear scan
x in my_set # O(1) hash lookup
# A loop checking x in my_list is O(nยฒ) not O(n)!
# TRAP 4: sort inside a loop
for item in data:
result.append(item)
result.sort() # sort ONCE after loop: O(n log n) total โ fine!
for item in data:
result.sort() # sorting inside the loop: O(nยฒ log n)!
# TRAP 5: Recursion depth
def process(node):
if node is None: return
process(node.left) # for a balanced tree: O(log n) depth
process(node.right) # for a skewed tree: O(n) depth โ stack overflow risk
# TRAP 6: Hash collisions
# Average: O(1) for hash table operations
# Worst case: O(n) if all keys hash to same bucket
# Python dicts use open addressing โ resistant but not immune
The time-space tradeoff: you can often use more memory to reduce time, or accept slower runtime to use less memory. Most optimizations trade space for time.
# Classic time-space tradeoffs:
# 1. Memoization (DP) โ O(n) space to avoid recomputation
# Without: O(2โฟ) time, O(n) space (call stack)
def fib(n):
if n <= 1: return n
return fib(n-1) + fib(n-2) # exponential!
# With memo: O(n) time, O(n) space
from functools import lru_cache
@lru_cache(maxsize=None)
def fib(n):
if n <= 1: return n
return fib(n-1) + fib(n-2) # O(n) time!
# 2. Hash set for O(1) lookup (vs O(n) list scan)
# Two sum: O(nยฒ) with nested loops, O(n) with hash map
def two_sum(nums, target):
seen = {} # O(n) extra space
for i, n in enumerate(nums):
complement = target - n
if complement in seen: # O(1) vs O(n) linear scan
return [seen[complement], i]
seen[n] = i
# 3. Precomputation โ compute prefix sums once, answer range queries in O(1)
def build_prefix(arr):
prefix = [0] * (len(arr) + 1)
for i, x in enumerate(arr): prefix[i+1] = prefix[i] + x
return prefix
def range_sum(prefix, l, r):
return prefix[r+1] - prefix[l] # O(1) query after O(n) build
# General optimization approach:
# 1. State the brute force first โ establish baseline
# 2. Identify the bottleneck (nested loops? repeated work? linear scans?)
# 3. What information is being recomputed? โ memoize it
# 4. What lookups are O(n)? โ replace with hash map
# 5. What sorting is done repeatedly? โ sort once and binary search
The two-pointer technique uses two indices moving through a data structure, often reducing an O(nยฒ) nested-loop solution to O(n). There are two main variants.
Opposite ends โ converging pointers (requires sorted array):
# Two Sum (sorted array): O(n) time, O(1) space
def two_sum_sorted(nums, target):
left, right = 0, len(nums) - 1
while left < right:
s = nums[left] + nums[right]
if s == target: return [left, right]
elif s < target: left += 1
else: right -= 1
return []
# Container With Most Water: O(n)
def max_area(heights):
left, right = 0, len(heights) - 1
best = 0
while left < right:
best = max(best, min(heights[left], heights[right]) * (right - left))
if heights[left] < heights[right]: left += 1
else: right -= 1
return best
Same direction โ fast/slow pointers (Floyd's cycle detection):
# Detect cycle in linked list: O(n) time, O(1) space
def has_cycle(head):
slow = fast = head
while fast and fast.next:
slow = slow.next
fast = fast.next.next
if slow == fast: return True
return False
# Find middle of linked list
def find_middle(head):
slow = fast = head
while fast and fast.next:
slow = slow.next
fast = fast.next.next
return slow # slow is at the middle
# Remove duplicates from sorted array in-place: O(n), O(1) space
def remove_duplicates(nums):
if not nums: return 0
write = 1
for read in range(1, len(nums)):
if nums[read] != nums[read-1]:
nums[write] = nums[read]
write += 1
return write
Sliding window maintains a contiguous subset (window) of a sequence and efficiently tracks some aggregate (sum, max, distinct count) as the window slides. Reduces O(nยทk) to O(n) for many subarray/substring problems.
Fixed-size window โ max sum subarray of size k:
# Brute force: O(nยทk); sliding window: O(n)
def max_sum_subarray(nums, k):
window_sum = sum(nums[:k]) # compute first window
best = window_sum
for i in range(k, len(nums)):
window_sum += nums[i] - nums[i-k] # slide: add right, remove left
best = max(best, window_sum)
return best
Variable-size window โ longest substring without repeating characters:
# Expand right, shrink left when invalid: O(n)
def length_of_longest_substring(s):
char_index = {}
left = best = 0
for right, char in enumerate(s):
if char in char_index and char_index[char] >= left:
left = char_index[char] + 1 # shrink window
char_index[char] = right
best = max(best, right - left + 1)
return best
# Minimum window substring (contains all chars of pattern): O(n+m)
def min_window(s, t):
from collections import Counter
need = Counter(t)
missing = len(t)
left = start = 0; best = float('inf'); best_left = 0
for right, char in enumerate(s):
if need[char] > 0: missing -= 1
need[char] -= 1
if missing == 0: # valid window found
while need[s[left]] < 0: # shrink from left
need[s[left]] += 1; left += 1
if right - left + 1 < best:
best = right - left + 1; best_left = left
need[s[left]] += 1; missing += 1; left += 1
return s[best_left:best_left+best] if best < float('inf') else ""
Binary search is O(log n) search on any monotonic space โ not just sorted arrays. The key insight: if you can define a predicate that transitions from False to True (or True to False) across the search space, binary search finds the transition point.
# Classic binary search โ find target in sorted array
def binary_search(nums, target):
left, right = 0, len(nums) - 1
while left <= right:
mid = left + (right - left) // 2 # avoid overflow (vs (l+r)//2)
if nums[mid] == target: return mid
elif nums[mid] < target: left = mid + 1
else: right = mid - 1
return -1
# Find leftmost position (lower bound):
def lower_bound(nums, target):
left, right = 0, len(nums)
while left < right:
mid = (left + right) // 2
if nums[mid] < target: left = mid + 1
else: right = mid # keep right side
return left # first index where nums[i] >= target
# Binary search on the ANSWER โ not on an array!
# "What is the minimum speed such that you can eat all bananas in h hours?"
def min_eating_speed(piles, h):
def can_finish(speed):
return sum((p + speed - 1) // speed for p in piles) <= h
left, right = 1, max(piles)
while left < right:
mid = (left + right) // 2
if can_finish(mid): right = mid # could be the answer, try lower
else: left = mid + 1
return left
# Pattern: Binary search on the answer space when:
# - Answer is in a bounded range [lo, hi]
# - "Is answer <= x?" is monotonic (once true, stays true)
# Examples: kth smallest in matrix, capacity to ship packages,
# minimum days to make m bouquets, split array largest sum
Senior engineers are expected to demonstrate structured problem-solving, not just arrive at a solution. A reliable framework:
- Clarify (2-3 min): Ask about input constraints (size, sign, null?), output format, edge cases. "Can there be duplicates?" "What if the array is empty?" Don't assume.
- Examples: Trace through a small concrete example by hand. Verify your understanding matches the expected output.
- Brute force first: State the naive O(nยฒ) or O(2โฟ) solution. Confirm it's correct. Gives you a baseline and shows you understand the problem.
- Optimize: Identify the bottleneck. "The nested loop is doing repeated work โ I can use a hash map to reduce inner loop to O(1)." Think aloud.
- Complexity analysis: Before coding, state the expected time and space. If wrong approach, better to know now.
- Code: Write clean, readable code. Use descriptive variable names. Handle edge cases.
- Test: Walk through your code with the example. Then test edge cases: empty input, single element, all same values, maximum constraints.
# Complexity mental model โ build intuition:
# Sorted array + search โ binary search โ O(log n)
# All pairs โ O(nยฒ) โ two pointers or hash map โ O(n)
# All subsets โ O(2โฟ) โ DP with states โ O(nยฒ) or O(nยทk)
# Tree problem โ DFS/BFS โ O(n)
# Shortest path โ BFS (unweighted) or Dijkstra โ O(V+E) or O(E log V)
# "Optimal substructure" + "overlapping subproblems" โ DP
Arrays & Strings
8 questionsA prefix sum array stores cumulative sums such that prefix[i] = arr[0] + arr[1] + ... + arr[i-1]. After O(n) preprocessing, any subarray sum query is answered in O(1).
# Build prefix sum: O(n) time, O(n) space
def build_prefix(arr):
n = len(arr)
prefix = [0] * (n + 1)
for i in range(n):
prefix[i+1] = prefix[i] + arr[i]
return prefix
# Range sum query: O(1)
def range_sum(prefix, l, r): # sum of arr[l..r] inclusive
return prefix[r+1] - prefix[l]
# Subarray sum equals k โ count subarrays: O(n)
def subarray_sum(nums, k):
count = 0
prefix_sum = 0
seen = {0: 1} # prefix_sum โ count of occurrences
for n in nums:
prefix_sum += n
count += seen.get(prefix_sum - k, 0) # how many times we've seen (current - k)
seen[prefix_sum] = seen.get(prefix_sum, 0) + 1
return count
# 2D prefix sums โ sum of rectangle in O(1):
def build_2d_prefix(matrix):
m, n = len(matrix), len(matrix[0])
P = [[0]*(n+1) for _ in range(m+1)]
for r in range(m):
for c in range(n):
P[r+1][c+1] = matrix[r][c] + P[r][c+1] + P[r+1][c] - P[r][c]
return P
def rect_sum(P, r1, c1, r2, c2): # inclusive rectangle
return P[r2+1][c2+1] - P[r1][c2+1] - P[r2+1][c1] + P[r1][c1]
Kadane's algorithm finds the maximum sum contiguous subarray in O(n) time and O(1) space. It's a classic example of dynamic programming with a rolling state.
# Maximum subarray sum: O(n), O(1)
def max_subarray(nums):
max_sum = cur_sum = nums[0]
for n in nums[1:]:
cur_sum = max(n, cur_sum + n) # extend current or start fresh
max_sum = max(max_sum, cur_sum)
return max_sum
# Key insight: at each position, decide whether to extend the current subarray
# or start a new one. If cur_sum < 0, it hurts us โ start fresh.
# With indices (to return the actual subarray):
def max_subarray_with_indices(nums):
best = cur = nums[0]
start = end = 0; cur_start = 0
for i in range(1, len(nums)):
if cur + nums[i] < nums[i]:
cur = nums[i]; cur_start = i
else:
cur += nums[i]
if cur > best:
best = cur; start = cur_start; end = i
return best, start, end
# Circular subarray maximum โ max of: normal Kadane OR total_sum - min_subarray
def max_subarray_circular(nums):
total = sum(nums)
max_normal = max_subarray(nums)
min_sum = -max_subarray([-x for x in nums]) # negate to find min
max_circular = total - min_sum
# Edge case: all negative โ max_circular = 0 (empty wrap) which is wrong
return max(max_normal, max_circular) if max_normal > 0 else max_normal
# Next permutation algorithm: O(n), O(1) space
def next_permutation(nums):
n = len(nums)
# Step 1: find rightmost ascending pair (i, i+1)
i = n - 2
while i >= 0 and nums[i] >= nums[i+1]: i -= 1
if i >= 0: # not the last permutation
# Step 2: find rightmost element larger than nums[i]
j = n - 1
while nums[j] <= nums[i]: j -= 1
nums[i], nums[j] = nums[j], nums[i] # Step 3: swap
# Step 4: reverse from i+1 to end (make it smallest possible suffix)
left, right = i + 1, n - 1
while left < right:
nums[left], nums[right] = nums[right], nums[left]
left += 1; right -= 1
# Generate all permutations: O(n! ยท n) time, O(n) auxiliary space
def permutations(nums):
result = []
def backtrack(start):
if start == len(nums): result.append(nums[:])
for i in range(start, len(nums)):
nums[start], nums[i] = nums[i], nums[start]
backtrack(start + 1)
nums[start], nums[i] = nums[i], nums[start] # undo swap
backtrack(0)
return result
# Permutations with duplicates โ skip duplicates at same level:
def permutations_unique(nums):
nums.sort()
result = []
def backtrack(path, used):
if len(path) == len(nums): result.append(path[:])
for i in range(len(nums)):
if used[i]: continue
if i > 0 and nums[i] == nums[i-1] and not used[i-1]: continue # skip dup
used[i] = True; path.append(nums[i])
backtrack(path, used)
used[i] = False; path.pop()
backtrack([], [False]*len(nums))
return result
# Missing number in [0..n]: XOR trick โ O(n), O(1)
def missing_number(nums):
xor = len(nums)
for i, n in enumerate(nums): xor ^= i ^ n
return xor
# All indices and values cancel except the missing one
# Find duplicate in array of 1..n (no extra space): Floyd's cycle detection
def find_duplicate(nums):
# Treat array as linked list: nums[i] โ nums[nums[i]]
slow = fast = nums[0]
while True: # Phase 1: find intersection
slow = nums[slow]
fast = nums[nums[fast]]
if slow == fast: break
slow = nums[0] # Phase 2: find cycle entrance
while slow != fast:
slow = nums[slow]; fast = nums[fast]
return slow
# All duplicates in [1..n] array: negative marking โ O(n), O(1) extra
def find_all_duplicates(nums):
result = []
for n in nums:
idx = abs(n) - 1
if nums[idx] < 0: result.append(abs(n)) # seen before
else: nums[idx] = -nums[idx] # mark as seen
return result
# Cyclic sort โ place each number at its correct index: O(n), O(1)
def cyclic_sort(nums):
i = 0
while i < len(nums):
correct = nums[i] - 1 # where nums[i] should go
if nums[i] != nums[correct]:
nums[i], nums[correct] = nums[correct], nums[i]
else: i += 1
return nums
# Trapping Rain Water: O(n), O(1)
# Water at position i = min(max_left[i], max_right[i]) - height[i]
def trap(height):
left, right = 0, len(height) - 1
left_max = right_max = 0
water = 0
while left < right:
if height[left] < height[right]:
if height[left] >= left_max: left_max = height[left]
else: water += left_max - height[left]
left += 1
else:
if height[right] >= right_max: right_max = height[right]
else: water += right_max - height[right]
right -= 1
return water
# Best Time to Buy and Sell Stock I (one transaction): O(n), O(1)
def max_profit(prices):
min_price = float('inf')
max_profit = 0
for p in prices:
min_price = min(min_price, p)
max_profit = max(max_profit, p - min_price)
return max_profit
# Multiple transactions (unlimited): O(n)
def max_profit_unlimited(prices):
return sum(max(0, prices[i]-prices[i-1]) for i in range(1, len(prices)))
# At most k transactions: O(nยทk) DP
def max_profit_k(k, prices):
n = len(prices)
if k >= n // 2: return sum(max(0, prices[i]-prices[i-1]) for i in range(1, n))
dp = [[0]*n for _ in range(k+1)]
for t in range(1, k+1):
max_so_far = -prices[0]
for d in range(1, n):
dp[t][d] = max(dp[t][d-1], prices[d] + max_so_far)
max_so_far = max(max_so_far, dp[t-1][d] - prices[d])
return dp[k][n-1]
A monotonic stack maintains elements in monotonically increasing or decreasing order. It solves "next greater/smaller element" type problems in O(n) โ each element is pushed and popped at most once.
# Next Greater Element for each position: O(n)
def next_greater(nums):
n = len(nums)
result = [-1] * n
stack = [] # indices, monotonically decreasing values
for i, num in enumerate(nums):
while stack and nums[stack[-1]] < num:
idx = stack.pop()
result[idx] = num # current num is the next greater for idx
stack.append(i)
return result
# Largest Rectangle in Histogram: O(n)
def largest_rectangle(heights):
stack = [] # monotonically increasing
max_area = 0
heights = heights + [0] # sentinel to flush all at end
for i, h in enumerate(heights):
start = i
while stack and stack[-1][1] > h:
idx, height = stack.pop()
max_area = max(max_area, height * (i - idx))
start = idx
stack.append((start, h))
return max_area
# Daily Temperatures (days until warmer): O(n)
def daily_temperatures(temps):
result = [0] * len(temps)
stack = [] # indices
for i, t in enumerate(temps):
while stack and temps[stack[-1]] < t:
idx = stack.pop()
result[idx] = i - idx
stack.append(i)
return result
# Sum of subarray minimums: O(n) with monotonic stack
# For each element, find how many subarrays it is the minimum of
# Rotate matrix 90ยฐ clockwise in-place: O(nยฒ), O(1)
def rotate(matrix):
n = len(matrix)
# Step 1: transpose (flip along main diagonal)
for i in range(n):
for j in range(i+1, n):
matrix[i][j], matrix[j][i] = matrix[j][i], matrix[i][j]
# Step 2: reverse each row
for row in matrix: row.reverse()
# Spiral order traversal: O(mยทn)
def spiral_order(matrix):
result = []
top, bottom, left, right = 0, len(matrix)-1, 0, len(matrix[0])-1
while top <= bottom and left <= right:
for c in range(left, right+1): result.append(matrix[top][c])
top += 1
for r in range(top, bottom+1): result.append(matrix[r][right])
right -= 1
if top <= bottom:
for c in range(right, left-1, -1): result.append(matrix[bottom][c])
bottom -= 1
if left <= right:
for r in range(bottom, top-1, -1): result.append(matrix[r][left])
left += 1
return result
# Flood fill / BFS: O(mยทn)
def flood_fill(image, sr, sc, color):
old = image[sr][sc]
if old == color: return image
from collections import deque
q = deque([(sr, sc)])
while q:
r, c = q.popleft()
if not (0 <= r < len(image) and 0 <= c < len(image[0])): continue
if image[r][c] != old: continue
image[r][c] = color
for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
q.append((r+dr, c+dc))
return image
# Group anagrams: O(nยทk log k) where k = max word length
from collections import defaultdict
def group_anagrams(strs):
groups = defaultdict(list)
for s in strs:
key = tuple(sorted(s)) # sorted chars as key
groups[key].append(s)
return list(groups.values())
# Or use character count tuple as key: O(nยทk) without sort
# LRU Cache โ O(1) get and put using dict + doubly linked list
class LRUCache:
class Node:
def __init__(self, key=0, val=0):
self.key = key; self.val = val
self.prev = self.next = None
def __init__(self, capacity):
self.capacity = capacity
self.cache = {}
self.head = self.Node() # dummy head (most recent)
self.tail = self.Node() # dummy tail (least recent)
self.head.next = self.tail
self.tail.prev = self.head
def _remove(self, node):
node.prev.next = node.next
node.next.prev = node.prev
def _insert_front(self, node):
node.next = self.head.next
node.prev = self.head
self.head.next.prev = node
self.head.next = node
def get(self, key):
if key not in self.cache: return -1
node = self.cache[key]
self._remove(node); self._insert_front(node) # move to front
return node.val
def put(self, key, value):
if key in self.cache: self._remove(self.cache[key])
node = self.Node(key, value)
self.cache[key] = node
self._insert_front(node)
if len(self.cache) > self.capacity:
lru = self.tail.prev
self._remove(lru)
del self.cache[lru.key]
Linked Lists, Stacks & Queues
6 questions# Complexities:
# Access by index: O(n) โ must traverse from head
# Search: O(n)
# Insert at head: O(1)
# Insert at tail: O(n) if no tail pointer, O(1) with tail pointer
# Insert at arbitrary position (given node): O(1)
# Delete at head: O(1)
# Delete arbitrary (given node): O(1) โ just relink; O(n) to find by value
class ListNode:
def __init__(self, val=0, next=None):
self.val = val; self.next = next
# Reverse iteratively: O(n), O(1)
def reverse_list(head):
prev, curr = None, head
while curr:
nxt = curr.next
curr.next = prev
prev = curr
curr = nxt
return prev
# Reverse recursively: O(n) time, O(n) space (call stack)
def reverse_recursive(head):
if not head or not head.next: return head
new_head = reverse_recursive(head.next)
head.next.next = head
head.next = None
return new_head
# Reverse a sublist from position left to right: O(n)
def reverse_between(head, left, right):
dummy = ListNode(0, head)
prev = dummy
for _ in range(left - 1): prev = prev.next
curr = prev.next
for _ in range(right - left):
nxt = curr.next
curr.next = nxt.next
nxt.next = prev.next
prev.next = nxt
return dummy.next
# Merge two sorted lists: O(n+m), O(1)
def merge_two(l1, l2):
dummy = ListNode()
curr = dummy
while l1 and l2:
if l1.val <= l2.val: curr.next = l1; l1 = l1.next
else: curr.next = l2; l2 = l2.next
curr = curr.next
curr.next = l1 or l2
return dummy.next
# Merge k sorted lists: O(n log k) with min-heap
import heapq
def merge_k_lists(lists):
dummy = curr = ListNode()
heap = []
for i, node in enumerate(lists):
if node: heapq.heappush(heap, (node.val, i, node))
while heap:
val, i, node = heapq.heappop(heap)
curr.next = node; curr = curr.next
if node.next: heapq.heappush(heap, (node.next.val, i, node.next))
return dummy.next
# Sort linked list: O(n log n), O(log n) space โ merge sort
def sort_list(head):
if not head or not head.next: return head
slow, fast = head, head.next
while fast and fast.next: slow = slow.next; fast = fast.next.next
mid = slow.next; slow.next = None # split at middle
left = sort_list(head); right = sort_list(mid)
return merge_two(left, right)
# Find cycle start: O(n), O(1) โ Floyd's algorithm phase 2
def detect_cycle(head):
slow = fast = head
while fast and fast.next:
slow = slow.next; fast = fast.next.next
if slow == fast: # cycle exists
slow = head
while slow != fast: # find entry point
slow = slow.next; fast = fast.next
return slow
return None
# Stack (LIFO) with two queues โ O(1) push, O(n) pop
from collections import deque
class StackWithQueues:
def __init__(self): self.q = deque()
def push(self, x): self.q.append(x)
def pop(self):
for _ in range(len(self.q) - 1): self.q.append(self.q.popleft())
return self.q.popleft()
def top(self):
val = self.pop(); self.push(val); return val
# Queue (FIFO) with two stacks โ amortized O(1) push and pop
class QueueWithStacks:
def __init__(self): self.inbox = []; self.outbox = []
def push(self, x): self.inbox.append(x)
def pop(self):
if not self.outbox: # transfer inbox โ outbox (reverses order)
while self.inbox: self.outbox.append(self.inbox.pop())
return self.outbox.pop()
def peek(self):
if not self.outbox:
while self.inbox: self.outbox.append(self.inbox.pop())
return self.outbox[-1]
# Min stack โ O(1) push, pop, min
class MinStack:
def __init__(self): self.stack = []; self.min_stack = []
def push(self, val):
self.stack.append(val)
m = min(val, self.min_stack[-1]) if self.min_stack else val
self.min_stack.append(m)
def pop(self): self.stack.pop(); self.min_stack.pop()
def top(self): return self.stack[-1]
def get_min(self): return self.min_stack[-1]
# Sliding window maximum in O(n) using monotonic deque
# Naive: O(nยทk) โ for each window, find max
# Deque: O(n) โ deque stores indices in decreasing order of value
from collections import deque
def sliding_window_max(nums, k):
dq = deque() # stores indices; front = largest value index
result = []
for i, n in enumerate(nums):
# Remove elements outside the window
while dq and dq[0] <= i - k: dq.popleft()
# Maintain decreasing order: remove smaller elements from back
while dq and nums[dq[-1]] < n: dq.pop()
dq.append(i)
if i >= k - 1: result.append(nums[dq[0]]) # front is max
return result
# Key insight:
# - Front of deque = index of max element in current window
# - We discard indices that left the window (front)
# - We discard indices with values smaller than current (back)
# - Why? A smaller element earlier can never be the max for any future window
# where the current element is also in the window
# Same pattern for:
# - Sliding window minimum (reverse comparison)
# - Shortest subarray with sum >= k (use prefix sums + deque)
# - Jump game VI (max score in window of size k)
# Valid parentheses / bracket matching: O(n)
def is_valid(s):
stack = []
pairs = {')': '(', '}': '{', ']': '['}
for c in s:
if c in '([{': stack.append(c)
elif not stack or stack[-1] != pairs[c]: return False
else: stack.pop()
return len(stack) == 0
# Evaluate Reverse Polish Notation (postfix): O(n)
def eval_rpn(tokens):
stack = []
ops = {
'+': lambda a, b: a + b,
'-': lambda a, b: a - b,
'*': lambda a, b: a * b,
'/': lambda a, b: int(a / b), # truncate toward zero
}
for t in tokens:
if t in ops:
b, a = stack.pop(), stack.pop()
stack.append(ops[t](a, b))
else:
stack.append(int(t))
return stack[0]
# Basic calculator (infix with +, -, parentheses): O(n)
def calculate(s):
stack = []
result = num = 0
sign = 1
for c in s:
if c.isdigit(): num = num * 10 + int(c)
elif c in '+-':
result += sign * num; num = 0
sign = 1 if c == '+' else -1
elif c == '(':
stack.append(result); stack.append(sign)
result = 0; sign = 1
elif c == ')':
result += sign * num; num = 0
result *= stack.pop() # sign before '('
result += stack.pop() # result before '('
return result + sign * num
A deque (double-ended queue) supports O(1) push and pop from both ends. It combines stack and queue capabilities.
# Python's collections.deque โ implemented as a doubly linked list of blocks
from collections import deque
dq = deque()
dq.append(1) # push right: O(1)
dq.appendleft(0) # push left: O(1)
dq.pop() # pop right: O(1)
dq.popleft() # pop left: O(1)
dq[0]; dq[-1] # peek front/back: O(1)
# Random access dq[i]: O(n) โ not like an array
dq = deque(maxlen=5) # circular buffer! Oldest element auto-dropped on overflow
# vs list for queue operations:
# list.pop(0) โ O(n) (shifts all elements) BAD for queue
# deque.popleft() โ O(1) GOOD for queue
# Use deque when:
# - BFS level-order traversal (queue)
# - Sliding window with additions/removals from both ends
# - Implementing undo/redo (bounded history with maxlen)
# - Any algorithm needing O(1) push/pop at both ends
# BFS template with deque:
def bfs(root):
if not root: return []
q = deque([root]); result = []
while q:
level_size = len(q)
level = []
for _ in range(level_size):
node = q.popleft()
level.append(node.val)
if node.left: q.append(node.left)
if node.right: q.append(node.right)
result.append(level)
return result
Trees & BSTs
10 questionsclass TreeNode:
def __init__(self, val=0, left=None, right=None):
self.val = val; self.left = left; self.right = right
# RECURSIVE โ all O(n) time, O(h) space where h = tree height
def inorder(root): # left โ root โ right (BST: gives sorted order)
return inorder(root.left) + [root.val] + inorder(root.right) if root else []
def preorder(root): # root โ left โ right (used to clone/serialize trees)
return [root.val] + preorder(root.left) + preorder(root.right) if root else []
def postorder(root): # left โ right โ root (used to delete trees, evaluate expressions)
return postorder(root.left) + postorder(root.right) + [root.val] if root else []
# ITERATIVE โ avoids system call stack limit
def inorder_iter(root):
result, stack = [], []
curr = root
while curr or stack:
while curr: stack.append(curr); curr = curr.left # go left
curr = stack.pop()
result.append(curr.val)
curr = curr.right # go right
return result
def preorder_iter(root):
if not root: return []
result, stack = [], [root]
while stack:
node = stack.pop()
result.append(node.val)
if node.right: stack.append(node.right) # right first (LIFO โ left processed first)
if node.left: stack.append(node.left)
return result
def postorder_iter(root):
if not root: return []
result, stack = [], [root]
while stack:
node = stack.pop()
result.append(node.val)
if node.left: stack.append(node.left)
if node.right: stack.append(node.right)
return result[::-1] # reverse of modified preorder (rootโrightโleft)
# Height (max depth): O(n)
def height(root):
if not root: return 0
return 1 + max(height(root.left), height(root.right))
# Check balanced: O(n) โ do height and balance check in one pass
def is_balanced(root):
def check(node):
if not node: return 0
left = check(node.left)
if left == -1: return -1 # already unbalanced
right = check(node.right)
if right == -1: return -1
if abs(left - right) > 1: return -1 # this node unbalanced
return 1 + max(left, right)
return check(root) != -1
# Diameter (longest path between any two nodes): O(n)
def diameter(root):
best = [0]
def dfs(node):
if not node: return 0
left = dfs(node.left)
right = dfs(node.right)
best[0] = max(best[0], left + right) # path through this node
return 1 + max(left, right) # height of this subtree
dfs(root)
return best[0]
# Maximum path sum (nodes can have negative values): O(n)
def max_path_sum(root):
best = [float('-inf')]
def dfs(node):
if not node: return 0
left = max(0, dfs(node.left)) # ignore if negative
right = max(0, dfs(node.right))
best[0] = max(best[0], node.val + left + right)
return node.val + max(left, right) # can only extend one way up
dfs(root)
return best[0]
# LCA in a BST: O(h) time, O(1) space
# Key insight: if both nodes are smaller, go left; if both larger, go right; else current is LCA
def lca_bst(root, p, q):
while root:
if p.val < root.val and q.val < root.val: root = root.left
elif p.val > root.val and q.val > root.val: root = root.right
else: return root # one on each side (or one equals root)
# LCA in a general binary tree: O(n)
def lca(root, p, q):
if not root or root == p or root == q: return root
left = lca(root.left, p, q)
right = lca(root.right, p, q)
if left and right: return root # p and q found on different sides
return left or right # both in same subtree
# LCA with parent pointers: O(h) using set of ancestors
def lca_with_parents(p, q):
ancestors = set()
while p:
ancestors.add(p); p = p.parent
while q:
if q in ancestors: return q
q = q.parent
# Binary lifting (for many LCA queries on a tree): O(n log n) preprocess, O(log n) query
# Build parent[node][k] = 2^k-th ancestor of node
# To find LCA(u,v): equalize depths using binary lifting, then lift together
# Serialize / Deserialize using preorder DFS: O(n)
class Codec:
def serialize(self, root):
vals = []
def dfs(node):
if not node: vals.append('N'); return
vals.append(str(node.val))
dfs(node.left); dfs(node.right)
dfs(root)
return ','.join(vals)
def deserialize(self, data):
vals = iter(data.split(','))
def dfs():
v = next(vals)
if v == 'N': return None
node = TreeNode(int(v))
node.left = dfs()
node.right = dfs()
return node
return dfs()
# BFS level-order serialization (like LeetCode's format):
def serialize_bfs(root):
if not root: return "[]"
from collections import deque
q, result = deque([root]), []
while q:
node = q.popleft()
if node:
result.append(str(node.val))
q.append(node.left); q.append(node.right)
else:
result.append('null')
return '[' + ','.join(result) + ']'
# Key insight for BST serialization:
# A BST can be uniquely reconstructed from preorder alone (no nulls needed)
# because BST property gives us the split point at each level
# All operations: O(h) โ O(log n) balanced, O(n) degenerate
def search_bst(root, val):
if not root or root.val == val: return root
return search_bst(root.left, val) if val < root.val else search_bst(root.right, val)
def insert_bst(root, val):
if not root: return TreeNode(val)
if val < root.val: root.left = insert_bst(root.left, val)
elif val > root.val: root.right = insert_bst(root.right, val)
return root
def delete_bst(root, key):
if not root: return None
if key < root.val: root.left = delete_bst(root.left, key)
elif key > root.val: root.right = delete_bst(root.right, key)
else:
if not root.left: return root.right # one child or leaf
if not root.right: return root.left
# Two children: replace with inorder successor (min of right subtree)
successor = root.right
while successor.left: successor = successor.left
root.val = successor.val
root.right = delete_bst(root.right, successor.val)
return root
# Validate BST โ track valid range for each node: O(n)
def is_valid_bst(root, lo=float('-inf'), hi=float('inf')):
if not root: return True
if not (lo < root.val < hi): return False
return (is_valid_bst(root.left, lo, root.val) and
is_valid_bst(root.right, root.val, hi))
# Kth smallest in BST โ inorder gives sorted order: O(h + k)
def kth_smallest(root, k):
stack = []
while True:
while root: stack.append(root); root = root.left
root = stack.pop()
k -= 1
if k == 0: return root.val
root = root.right
Unbalanced BSTs degenerate to O(n) operations on sorted input. Self-balancing BSTs guarantee O(log n) operations by rebalancing after insertions and deletions.
AVL Trees: Maintain the invariant that for every node, the height difference between left and right subtrees is at most 1 (balance factor โ {-1, 0, 1}). Rebalance using rotations (4 cases: LL, RR, LR, RL) after any insert/delete. More strictly balanced than Red-Black โ faster reads. Heavier rebalancing โ slower writes.
Red-Black Trees: Maintain 5 properties: every node is red or black; root is black; leaves (NIL) are black; red nodes have black children; every path from a node to its NIL descendants has the same number of black nodes. Less strictly balanced than AVL (height โค 2ยทlog n) but fewer rotations on write. Used in: Java TreeMap, C++ std::map, Linux kernel scheduler.
Rotations (both use them):
# Right rotation (fixes left-heavy imbalance):
# y x
# / \ / \
# x C โ A y
# / \ / \
# A B B C
def right_rotate(y):
x = y.left
B = x.right
x.right = y
y.left = B
# Update heights (AVL) or colors (RB)
return x # x is new root
Practical note: You'll rarely implement these in interviews โ know the concepts, invariants, and trade-offs. In production, use library implementations (Python's sortedcontainers.SortedList, Java's TreeMap).
B-Trees: Generalization for disk storage โ nodes can have many keys/children, reducing tree height. Used in databases and file systems (every disk I/O reads a full node which holds many keys).
# Trie (prefix tree): O(m) insert/search where m = word length
class TrieNode:
def __init__(self):
self.children = {}
self.is_end = False
class Trie:
def __init__(self): self.root = TrieNode()
def insert(self, word):
node = self.root
for c in word:
node = node.children.setdefault(c, TrieNode())
node.is_end = True
def search(self, word):
node = self.root
for c in word:
if c not in node.children: return False
node = node.children[c]
return node.is_end
def starts_with(self, prefix):
node = self.root
for c in prefix:
if c not in node.children: return False
node = node.children[c]
return True
def words_with_prefix(self, prefix):
node = self.root
for c in prefix:
if c not in node.children: return []
node = node.children[c]
result = []
def dfs(node, path):
if node.is_end: result.append(prefix + path)
for c, child in node.children.items(): dfs(child, path + c)
dfs(node, "")
return result
Trie vs hash map:
- Prefix operations: Trie natively supports "all words with prefix X" in O(m + output). Hash map requires scanning all keys โ O(nยทm).
- Autocomplete / spell check: Trie is the natural structure.
- Lexicographic order: Trie traversal gives sorted order; hash maps don't.
- Space: Trie can use more memory (pointer overhead per node); a hash map with full strings can be cheaper for sparse word sets.
# Build from Preorder + Inorder: O(n) with hash map
def build_tree(preorder, inorder):
idx_map = {val: i for i, val in enumerate(inorder)}
pre_iter = iter(preorder)
def helper(left, right):
if left > right: return None
val = next(pre_iter)
node = TreeNode(val)
mid = idx_map[val] # root's position in inorder
node.left = helper(left, mid - 1)
node.right = helper(mid + 1, right)
return node
return helper(0, len(inorder) - 1)
# Build from Postorder + Inorder: O(n)
def build_from_postorder(inorder, postorder):
idx_map = {val: i for i, val in enumerate(inorder)}
post_iter = iter(reversed(postorder)) # process from the end
def helper(left, right):
if left > right: return None
val = next(post_iter)
node = TreeNode(val)
mid = idx_map[val]
# Build RIGHT first (postorder ends with root, right before left)
node.right = helper(mid + 1, right)
node.left = helper(left, mid - 1)
return node
return helper(0, len(inorder) - 1)
# Key insight: preorder/postorder gives you the ROOT; inorder tells you
# which elements are LEFT vs RIGHT of the root.
# You need at least one of preorder/postorder + inorder to reconstruct uniquely.
# Preorder + postorder alone doesn't uniquely reconstruct (ambiguity with single children).
# Flatten binary tree to linked list (preorder): O(n), O(1) space (Morris-style)
def flatten(root):
curr = root
while curr:
if curr.left:
# Find rightmost node of left subtree
prev = curr.left
while prev.right: prev = prev.right
# Connect: left subtree's rightmost โ curr's right subtree
prev.right = curr.right
curr.right = curr.left
curr.left = None
curr = curr.right
# Convert sorted array to height-balanced BST: O(n)
def sorted_array_to_bst(nums):
if not nums: return None
mid = len(nums) // 2
node = TreeNode(nums[mid])
node.left = sorted_array_to_bst(nums[:mid])
node.right = sorted_array_to_bst(nums[mid+1:])
return node
# Convert BST to sorted doubly linked list (in-place): O(n)
def bst_to_dll(root):
prev = head = [None, None] # [prev, head]
def inorder(node):
if not node: return
inorder(node.left)
if prev[0]:
prev[0].right = node
node.left = prev[0]
else:
prev[1] = node # head
prev[0] = node
inorder(node.right)
inorder(root)
return prev[1] # head of DLL
Morris traversal uses the tree's existing null pointers to create temporary "threads" back to the inorder successor, eliminating the need for a stack or recursion. O(n) time, O(1) space.
# Morris Inorder Traversal: O(n), O(1) space
def morris_inorder(root):
result = []
curr = root
while curr:
if not curr.left:
result.append(curr.val) # no left subtree โ visit and go right
curr = curr.right
else:
# Find inorder predecessor (rightmost in left subtree)
prev = curr.left
while prev.right and prev.right != curr:
prev = prev.right
if not prev.right:
# Thread: connect predecessor's right to current
prev.right = curr
curr = curr.left
else:
# Thread already exists โ we've come back around
prev.right = None # remove thread
result.append(curr.val) # visit current
curr = curr.right
return result
# How it works:
# 1. If no left child: visit, go right
# 2. If left child: find predecessor (rightmost of left subtree)
# a. If predecessor.right is null: thread it to curr, go left
# b. If predecessor.right == curr: thread already made, remove it, visit curr, go right
# Each node is visited exactly twice: once when threading, once when following thread
# โ O(n) total, no extra space
Heaps & Priority Queues
5 questionsA binary heap is a complete binary tree stored as an array. In a min-heap, every parent is โค its children (root = minimum). In a max-heap, every parent is โฅ its children.
# Array representation: for node at index i
# parent: (i-1) // 2
# left child: 2*i + 1
# right child: 2*i + 2
# Complexities:
# peek (min/max): O(1)
# insert: O(log n) โ add at end, bubble up
# extract min/max: O(log n) โ remove root, replace with last, bubble down
# build from array: O(n) โ heapify from bottom up (counterintuitive!)
# delete arbitrary: O(log n) โ with a position map
# Build heap in O(n) โ why not O(n log n)?
# Nodes at height h: ~n/2^(h+1)
# Work at height h: O(h)
# Total: sum over h from 0 to log n of n/2^(h+1) * h = O(n)
import heapq
# Python heapq is a min-heap
nums = [3, 1, 4, 1, 5, 9, 2, 6]
heapq.heapify(nums) # O(n) in-place
heapq.heappush(nums, 7) # O(log n)
min_val = heapq.heappop(nums) # O(log n)
min_val = nums[0] # O(1) peek
# Max-heap in Python: negate values
max_heap = [-x for x in nums]
heapq.heapify(max_heap)
max_val = -heapq.heappop(max_heap)
# heappushpop: push then pop, more efficient than separate calls
# heapreplace: pop then push (even more efficient โ doesn't check)
# Approach 1: Sort โ O(n log n), O(1) extra space
def kth_largest_sort(nums, k):
nums.sort(reverse=True)
return nums[k-1]
# Approach 2: Min-heap of size k โ O(n log k), O(k) space
# Best when k << n or when data streams in
import heapq
def kth_largest_heap(nums, k):
heap = []
for n in nums:
heapq.heappush(heap, n)
if len(heap) > k: heapq.heappop(heap) # keep only k largest
return heap[0] # root = smallest of the k largest = kth largest
# Approach 3: Quickselect โ O(n) average, O(nยฒ) worst, O(1) extra space
def kth_largest_quickselect(nums, k):
k = len(nums) - k # convert to k-th smallest index
def quickselect(lo, hi):
pivot = nums[hi]
p = lo
for i in range(lo, hi):
if nums[i] <= pivot:
nums[i], nums[p] = nums[p], nums[i]
p += 1
nums[p], nums[hi] = nums[hi], nums[p]
if p == k: return nums[p]
elif p < k: return quickselect(p+1, hi)
else: return quickselect(lo, p-1)
return quickselect(0, len(nums)-1)
# Random pivot selection makes worst case unlikely in practice
# K most frequent elements: O(n log k)
from collections import Counter
def top_k_frequent(nums, k):
count = Counter(nums)
return heapq.nlargest(k, count.keys(), key=count.get)
# Two heaps: max-heap for lower half, min-heap for upper half
# Invariant: |lower| == |upper| or |lower| == |upper| + 1
# Median = lower top (odd total) or avg of both tops (even total)
import heapq
class MedianFinder:
def __init__(self):
self.lower = [] # max-heap (negated values)
self.upper = [] # min-heap
def add_num(self, num):
# Always add to lower first
heapq.heappush(self.lower, -num)
# Balance: lower top must be <= upper top
if self.upper and (-self.lower[0]) > self.upper[0]:
heapq.heappush(self.upper, -heapq.heappop(self.lower))
# Size balance: lower can have at most 1 more than upper
if len(self.lower) > len(self.upper) + 1:
heapq.heappush(self.upper, -heapq.heappop(self.lower))
elif len(self.upper) > len(self.lower):
heapq.heappush(self.lower, -heapq.heappop(self.upper))
def find_median(self):
if len(self.lower) > len(self.upper):
return float(-self.lower[0])
return (-self.lower[0] + self.upper[0]) / 2.0
# Complexity: add_num O(log n), find_median O(1)
# Sliding window median (fixed window k): use two heaps + lazy deletion
# When element leaves window, mark it as deleted; skip on next pop
# Heap Sort: O(n log n) time, O(1) space โ in-place!
def heap_sort(arr):
n = len(arr)
# Phase 1: Build max-heap from bottom up: O(n)
for i in range(n // 2 - 1, -1, -1):
sift_down(arr, n, i)
# Phase 2: Extract max n times: O(n log n)
for end in range(n-1, 0, -1):
arr[0], arr[end] = arr[end], arr[0] # move max to end
sift_down(arr, end, 0) # restore heap property
def sift_down(arr, n, i):
largest = i
left, right = 2*i+1, 2*i+2
if left < n and arr[left] > arr[largest]: largest = left
if right < n and arr[right] > arr[largest]: largest = right
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i]
sift_down(arr, n, largest)
Heap sort vs quicksort vs merge sort:
- Heap sort: O(n log n) guaranteed, O(1) space. Disadvantage: poor cache performance (non-sequential memory access on heap array), not stable, slower in practice than quicksort.
- Quicksort: O(n log n) average, O(nยฒ) worst case, O(log n) space. Best cache performance (sequential access). Fastest in practice. Not stable.
- Merge sort: O(n log n) guaranteed, O(n) space. Stable. Best for linked lists (no random access needed). Preferred when stability matters.
When heap sort wins: Real-time systems where O(nยฒ) worst case is unacceptable and O(n) extra memory is also unacceptable. Embedded systems with tight memory constraints.
# Meeting Rooms II โ minimum rooms needed: O(n log n)
import heapq
def min_meeting_rooms(intervals):
if not intervals: return 0
intervals.sort(key=lambda x: x[0]) # sort by start time
heap = [] # min-heap of end times (earliest ending meeting)
for start, end in intervals:
if heap and heap[0] <= start:
heapq.heapreplace(heap, end) # reuse this room
else:
heapq.heappush(heap, end) # need new room
return len(heap)
# Task Scheduler โ minimize idle time: O(n log n)
from collections import Counter
def least_interval(tasks, n):
freq = sorted(Counter(tasks).values(), reverse=True)
max_freq = freq[0]
idle = (max_freq - 1) * n
for f in freq[1:]:
idle -= min(f, max_freq - 1)
return max(0, idle) + len(tasks)
# Dijkstra's shortest path uses a min-heap:
def dijkstra(graph, src): # graph = {node: [(weight, neighbor), ...]}
dist = {src: 0}
heap = [(0, src)]
while heap:
d, u = heapq.heappop(heap)
if d > dist.get(u, float('inf')): continue # stale entry
for w, v in graph[u]:
nd = d + w
if nd < dist.get(v, float('inf')):
dist[v] = nd
heapq.heappush(heap, (nd, v))
return dist
Graphs
10 questions# Adjacency List: O(V+E) space โ best for sparse graphs (most real graphs)
# neighbors[u] = list of (neighbor, weight) pairs
from collections import defaultdict
graph = defaultdict(list)
graph[0].append((1, 5)) # edge 0โ1 with weight 5
graph[0].append((2, 3))
# Check edge uโv: O(degree(u))
# Iterate neighbors: O(degree(u))
# Add edge: O(1)
# Space: O(V + E)
# Adjacency Matrix: O(Vยฒ) space โ best for dense graphs or frequent edge checks
n = 5
adj = [[0]*n for _ in range(n)]
adj[0][1] = 5 # edge 0โ1 weight 5
adj[0][2] = 3
# Check edge uโv: O(1) โ adj[u][v]
# Iterate neighbors: O(V) โ must scan entire row
# Space: O(Vยฒ) โ prohibitive for large sparse graphs
# Edge List: O(E) space โ simple, used in Kruskal's algorithm
edges = [(0, 1, 5), (0, 2, 3), (1, 3, 2)] # (u, v, weight)
# When to use each:
# Adjacency list: most graph algorithms (BFS, DFS, Dijkstra)
# Adjacency matrix: dense graphs, Floyd-Warshall (all-pairs shortest path)
# Edge list: Kruskal's MST (sort edges by weight)
# Implicit graph โ no explicit structure, just a function:
# Grid problems: each cell is a node, neighbors are adjacent cells
# State space search: each state is a node, transitions are edges
# BFS: O(V+E) โ level-by-level exploration using a queue
from collections import deque
def bfs(graph, start):
visited = {start}
q = deque([start])
while q:
node = q.popleft()
for neighbor in graph[node]:
if neighbor not in visited:
visited.add(neighbor)
q.append(neighbor)
# DFS: O(V+E) โ deep exploration using stack (or recursion)
def dfs_recursive(graph, node, visited=None):
if visited is None: visited = set()
visited.add(node)
for neighbor in graph[node]:
if neighbor not in visited:
dfs_recursive(graph, neighbor, visited)
def dfs_iterative(graph, start):
visited = {start}
stack = [start]
while stack:
node = stack.pop()
for neighbor in graph[node]:
if neighbor not in visited:
visited.add(neighbor)
stack.append(neighbor)
Use BFS when:
- Finding the shortest path in an unweighted graph
- Finding the minimum number of steps/moves
- Level-order processing (tree levels, word ladder)
- Checking bipartiteness (2-coloring)
Use DFS when:
- Detecting cycles
- Topological sort
- Finding connected components, SCCs
- Exploring all paths (backtracking)
- Generating combinations/permutations
# Dijkstra: Single-source shortest path with non-negative weights
# O(E log V) with binary heap; O(E + V log V) with Fibonacci heap
import heapq
def dijkstra(graph, src, n):
dist = [float('inf')] * n
dist[src] = 0
heap = [(0, src)] # (distance, node)
while heap:
d, u = heapq.heappop(heap)
if d > dist[u]: continue # stale entry โ skip
for v, w in graph[u]:
if dist[u] + w < dist[v]:
dist[v] = dist[u] + w
heapq.heappush(heap, (dist[v], v))
return dist
# Why Dijkstra fails with negative edges:
# Greedy assumption: "once a node is settled (popped from heap), its distance is final"
# With negative edges, a later path could improve an already-settled node โ incorrect
# Fix 1: Bellman-Ford โ handles negative edges, O(VE)
def bellman_ford(edges, n, src): # edges = [(u, v, w), ...]
dist = [float('inf')] * n
dist[src] = 0
for _ in range(n - 1): # relax all edges V-1 times
for u, v, w in edges:
if dist[u] + w < dist[v]: dist[v] = dist[u] + w
# Check for negative cycles:
for u, v, w in edges:
if dist[u] + w < dist[v]: return None # negative cycle exists
return dist
# Fix 2: SPFA (Bellman-Ford with queue optimization) โ faster in practice
# Fix 3: Add constant to make all edges non-negative (Johnson's algorithm for all-pairs)
# Topological sort: linear ordering of vertices such that for every directed edge
# uโv, u appears before v. Only exists for DAGs (directed acyclic graphs).
# Kahn's algorithm (BFS-based): O(V+E)
from collections import deque, defaultdict
def topo_sort_kahn(n, prerequisites):
graph = defaultdict(list); in_degree = [0] * n
for dest, src in prerequisites:
graph[src].append(dest)
in_degree[dest] += 1
q = deque(i for i in range(n) if in_degree[i] == 0)
order = []
while q:
node = q.popleft()
order.append(node)
for neighbor in graph[node]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0: q.append(neighbor)
return order if len(order) == n else [] # empty = cycle detected!
# DFS-based: post-order DFS โ node added AFTER all its dependencies
def topo_sort_dfs(n, graph):
WHITE, GRAY, BLACK = 0, 1, 2 # unvisited, in-progress, done
state = [WHITE] * n
order = []
def dfs(u):
if state[u] == GRAY: return False # back edge = cycle!
if state[u] == BLACK: return True
state[u] = GRAY
for v in graph[u]:
if not dfs(v): return False
state[u] = BLACK
order.append(u)
return True
for i in range(n):
if state[i] == WHITE and not dfs(i): return []
return order[::-1] # reverse post-order = topological order
# Cycle detection in undirected graph: DFS tracking parent
def has_cycle_undirected(graph, n):
visited = [False] * n
def dfs(u, parent):
visited[u] = True
for v in graph[u]:
if not visited[v]:
if dfs(v, u): return True
elif v != parent: return True # back edge
return False
return any(dfs(i, -1) for i in range(n) if not visited[i])
# Union-Find: O(ฮฑ(n)) per operation โ essentially O(1) amortized (ฮฑ = inverse Ackermann)
# Used for: connected components, cycle detection, Kruskal's MST, network connectivity
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
self.count = n # number of components
def find(self, x):
# Path compression: make all nodes point directly to root
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # recursive compression
return self.parent[x]
def union(self, x, y):
rx, ry = self.find(x), self.find(y)
if rx == ry: return False # already connected โ union would create cycle
# Union by rank: attach smaller tree under larger tree
if self.rank[rx] < self.rank[ry]: rx, ry = ry, rx
self.parent[ry] = rx
if self.rank[rx] == self.rank[ry]: self.rank[rx] += 1
self.count -= 1
return True
def connected(self, x, y): return self.find(x) == self.find(y)
# Example: Number of islands using Union-Find
def num_islands_uf(grid):
if not grid: return 0
m, n = len(grid), len(grid[0])
uf = UnionFind(m * n)
water = sum(grid[r][c] == '0' for r in range(m) for c in range(n))
for r in range(m):
for c in range(n):
if grid[r][c] == '1':
for dr, dc in [(0,1),(1,0)]:
nr, nc = r+dr, c+dc
if 0 <= nr < m and 0 <= nc < n and grid[nr][nc] == '1':
uf.union(r*n+c, nr*n+nc)
return uf.count - water
# Kruskal's MST: O(E log E) โ sort edges, greedily add if no cycle
def kruskal(n, edges): # edges = [(weight, u, v)]
edges.sort()
uf = UnionFind(n)
mst_cost = 0; mst_edges = []
for w, u, v in edges:
if uf.union(u, v): # no cycle โ add to MST
mst_cost += w
mst_edges.append((u, v, w))
if len(mst_edges) == n - 1: break # MST complete
return mst_cost, mst_edges
# Prim's MST: O(E log V) with heap โ grow MST from a starting vertex
import heapq
def prim(graph, n): # graph = {u: [(w, v), ...]}
visited = set()
heap = [(0, 0, -1)] # (weight, node, parent)
mst_cost = 0; mst_edges = []
while heap and len(visited) < n:
w, u, parent = heapq.heappop(heap)
if u in visited: continue
visited.add(u)
mst_cost += w
if parent != -1: mst_edges.append((parent, u, w))
for edge_w, v in graph[u]:
if v not in visited:
heapq.heappush(heap, (edge_w, v, u))
return mst_cost, mst_edges
# Kruskal's vs Prim's:
# Kruskal: better for sparse graphs, global edge sorting
# Prim: better for dense graphs, grows locally from one vertex
# Both produce MSTs of the same total weight (not necessarily same edges)
# Floyd-Warshall: All-pairs shortest paths in O(Vยณ) time, O(Vยฒ) space
# Handles negative edges (not negative cycles)
def floyd_warshall(n, edges):
INF = float('inf')
dist = [[INF]*n for _ in range(n)]
for i in range(n): dist[i][i] = 0
for u, v, w in edges: dist[u][v] = w # directed; add both for undirected
for k in range(n): # try routing through intermediate vertex k
for i in range(n):
for j in range(n):
if dist[i][k] + dist[k][j] < dist[i][j]:
dist[i][j] = dist[i][k] + dist[k][j]
# Detect negative cycles: if dist[i][i] < 0 for any i, there's a negative cycle
return dist
# Floyd-Warshall vs V * Dijkstra:
# Floyd-Warshall: O(Vยณ) โ always, simple code, handles negatives (no neg cycles)
# V * Dijkstra: O(VยทE log V) โ better for sparse graphs (E << Vยฒ)
# requires non-negative edges
# Rule of thumb:
# Dense graph (E โ Vยฒ): Floyd-Warshall simpler and similar speed
# Sparse graph (E โ V): V * Dijkstra much faster
# Negative edges present: Floyd-Warshall (or V * Bellman-Ford: O(VยฒยทE))
# Transitive closure โ is there ANY path from i to j?
def transitive_closure(n, edges):
reach = [[False]*n for _ in range(n)]
for i in range(n): reach[i][i] = True
for u, v, _ in edges: reach[u][v] = True
for k in range(n):
for i in range(n):
for j in range(n):
reach[i][j] = reach[i][j] or (reach[i][k] and reach[k][j])
A Strongly Connected Component (SCC) is a maximal set of vertices such that there's a path from each vertex to every other. In an SCC, everyone can reach everyone.
# Kosaraju's algorithm: O(V+E), two DFS passes
def kosaraju(n, graph, rev_graph):
visited = [False] * n
finish_order = []
def dfs1(u): # DFS on original graph
visited[u] = True
for v in graph[u]:
if not visited[v]: dfs1(v)
finish_order.append(u) # add to stack in finish order
for i in range(n):
if not visited[i]: dfs1(i)
components = []
visited = [False] * n
def dfs2(u, comp): # DFS on reversed graph
visited[u] = True; comp.append(u)
for v in rev_graph[u]:
if not visited[v]: dfs2(v, comp)
while finish_order:
u = finish_order.pop() # process in reverse finish order
if not visited[u]:
comp = []; dfs2(u, comp)
components.append(comp)
return components
# Tarjan's SCC: O(V+E), single DFS โ uses discovery time and low-link values
# More complex but elegant; finds SCCs in reverse topological order
Applications of SCCs: Condensation graph (DAG of SCCs enables DP); 2-SAT problem solving; finding strongly connected web page clusters; detecting mutually dependent modules; game theory (which positions can reach which).
# Multi-source BFS: start BFS from ALL sources simultaneously
# Gives minimum distance from any source โ O(V+E)
# 01 Matrix: distance of each cell from nearest 0
from collections import deque
def update_matrix(mat):
m, n = len(mat), len(mat[0])
dist = [[float('inf')]*n for _ in range(m)]
q = deque()
for r in range(m):
for c in range(n):
if mat[r][c] == 0:
dist[r][c] = 0
q.append((r, c)) # all zeros are sources
while q:
r, c = q.popleft()
for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
nr, nc = r+dr, c+dc
if 0 <= nr < m and 0 <= nc < n and dist[nr][nc] > dist[r][c] + 1:
dist[nr][nc] = dist[r][c] + 1
q.append((nr, nc))
return dist
# Walls and Gates: fill each empty room with distance to nearest gate
def walls_and_gates(rooms):
if not rooms: return
m, n = len(rooms), len(rooms[0])
GATE, EMPTY, WALL = 0, 2**31-1, -1
q = deque((r,c) for r in range(m) for c in range(n) if rooms[r][c] == GATE)
while q:
r, c = q.popleft()
for dr, dc in [(0,1),(0,-1),(1,0),(-1,0)]:
nr, nc = r+dr, c+dc
if 0 <= nr < m and 0 <= nc < n and rooms[nr][nc] == EMPTY:
rooms[nr][nc] = rooms[r][c] + 1
q.append((nr, nc))
# Grid as implicit graph: cell (r,c) is a node, edges to 4 neighbors
DIRS = [(0,1),(0,-1),(1,0),(-1,0)] # 4-directional
DIRS8 = [(0,1),(0,-1),(1,0),(-1,0),(1,1),(1,-1),(-1,1),(-1,-1)] # 8-directional
def in_bounds(r, c, m, n): return 0 <= r < m and 0 <= c < n
# Number of Islands โ DFS/BFS flood fill: O(m*n)
def num_islands(grid):
m, n = len(grid), len(grid[0])
count = 0
def dfs(r, c):
if not in_bounds(r,c,m,n) or grid[r][c] != '1': return
grid[r][c] = '0' # mark visited (modify in-place)
for dr, dc in DIRS: dfs(r+dr, c+dc)
for r in range(m):
for c in range(n):
if grid[r][c] == '1': dfs(r, c); count += 1
return count
# Shortest path in a grid with obstacles (0=open, 1=blocked): BFS
def shortest_path(grid):
n = len(grid)
if grid[0][0] == 1 or grid[n-1][n-1] == 1: return -1
q = deque([(0, 0, 1)]) # (row, col, distance)
visited = {(0, 0)}
while q:
r, c, dist = q.popleft()
if r == n-1 and c == n-1: return dist
for dr, dc in DIRS8:
nr, nc = r+dr, c+dc
if in_bounds(nr,nc,n,n) and grid[nr][nc] == 0 and (nr,nc) not in visited:
visited.add((nr, nc))
q.append((nr, nc, dist+1))
return -1
# 0-1 BFS: edges have weight 0 or 1 โ use deque, O(V+E)
# When moving costs 0: appendleft (stays at same distance)
# When moving costs 1: append (advances distance)
# Faster than Dijkstra for 0-1 weighted grids
Sorting & Searching
6 questions# Algorithm | Time (avg) | Time (worst) | Space | Stable | Notes
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# Bubble sort | O(nยฒ) | O(nยฒ) | O(1) | Yes | Adaptive: O(n) if nearly sorted
# Selection sort | O(nยฒ) | O(nยฒ) | O(1) | No | Min swaps: O(n)
# Insertion sort | O(nยฒ) | O(nยฒ) | O(1) | Yes | Best for small/nearly sorted
# Merge sort | O(n log n) | O(n log n) | O(n) | Yes | Linked lists, external sort
# Quicksort | O(n log n) | O(nยฒ) | O(log n)| No | Fastest in practice (cache-friendly)
# Heap sort | O(n log n) | O(n log n) | O(1) | No | Guaranteed O(n log n), poor cache
# Counting sort | O(n+k) | O(n+k) | O(k) | Yes | Only for integers in [0,k]
# Radix sort | O(dยท(n+k)) | O(dยท(n+k)) | O(n+k) | Yes | Fixed-length integers/strings
# Timsort | O(n log n) | O(n log n) | O(n) | Yes | Python's sort โ hybrid merge+insertion
# Stability matters when: sorting objects with multiple keys
# Sort by (last_name, first_name): sort by first_name first (stable),
# then by last_name โ stable sort preserves first_name order within equal last names
# Python's sort: Timsort โ stable, O(n log n), adaptive (O(n) on nearly sorted)
data.sort(key=lambda x: (x.priority, x.name), reverse=True)
# O(n) lower bound proof for comparison-based sorting:
# Any comparison-based sort needs ฮฉ(n log n) comparisons in worst case
# (decision tree argument: n! possible orderings โ log(n!) = ฮ(n log n) depth)
# Merge Sort: O(n log n) โ stable, good for linked lists
def merge_sort(arr):
if len(arr) <= 1: return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
return merge(left, right)
def merge(left, right):
result = []
i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]: result.append(left[i]); i += 1
else: result.append(right[j]); j += 1
result.extend(left[i:]); result.extend(right[j:])
return result
# Quicksort โ Lomuto partition scheme: O(n log n) avg, O(nยฒ) worst
import random
def quicksort(arr, lo=0, hi=None):
if hi is None: hi = len(arr) - 1
if lo >= hi: return
pivot_idx = partition(arr, lo, hi)
quicksort(arr, lo, pivot_idx - 1)
quicksort(arr, pivot_idx + 1, hi)
def partition(arr, lo, hi):
# Random pivot to avoid O(nยฒ) on sorted input
rand_idx = random.randint(lo, hi)
arr[rand_idx], arr[hi] = arr[hi], arr[rand_idx]
pivot = arr[hi]
i = lo - 1
for j in range(lo, hi):
if arr[j] <= pivot:
i += 1
arr[i], arr[j] = arr[j], arr[i]
arr[i+1], arr[hi] = arr[hi], arr[i+1]
return i + 1
# Hoare partition (original, fewer swaps):
def hoare_partition(arr, lo, hi):
pivot = arr[(lo+hi)//2]
i, j = lo - 1, hi + 1
while True:
i += 1
while arr[i] < pivot: i += 1
j -= 1
while arr[j] > pivot: j -= 1
if i >= j: return j
arr[i], arr[j] = arr[j], arr[i]
# 3-way partition (Dutch National Flag) โ handles duplicates efficiently:
# Partition into: < pivot | == pivot | > pivot
# Counting sort: O(n+k) time and space where k = range of values
# Restriction: integer keys in bounded range [0, k]
def counting_sort(arr, k):
count = [0] * (k + 1)
for x in arr: count[x] += 1
result = []
for val, freq in enumerate(count):
result.extend([val] * freq)
return result
# Stable counting sort (for use in radix sort):
def counting_sort_stable(arr, exp): # sort by digit at position exp
n = len(arr); output = [0]*n; count = [0]*10
for x in arr: count[(x // exp) % 10] += 1
for i in range(1, 10): count[i] += count[i-1] # prefix sums
for i in range(n-1, -1, -1): # right to left for stability
d = (arr[i] // exp) % 10
count[d] -= 1
output[count[d]] = arr[i]
return output
# Radix sort: O(dยท(n+k)) where d = number of digits, k = base (10)
def radix_sort(arr):
max_val = max(arr)
exp = 1
while max_val // exp > 0:
arr = counting_sort_stable(arr, exp)
exp *= 10
return arr
# Bucket sort: O(n) average for uniformly distributed floats in [0,1)
def bucket_sort(arr):
n = len(arr)
buckets = [[] for _ in range(n)]
for x in arr: buckets[int(x * n)].append(x)
for bucket in buckets: bucket.sort()
return [x for bucket in buckets for x in bucket]
# Rotated sorted array (no duplicates): O(log n)
def search_rotated(nums, target):
lo, hi = 0, len(nums) - 1
while lo <= hi:
mid = (lo + hi) // 2
if nums[mid] == target: return mid
# Determine which half is sorted:
if nums[lo] <= nums[mid]: # left half is sorted
if nums[lo] <= target < nums[mid]: hi = mid - 1
else: lo = mid + 1
else: # right half is sorted
if nums[mid] < target <= nums[hi]: lo = mid + 1
else: hi = mid - 1
return -1
# Find minimum in rotated sorted array: O(log n)
def find_min_rotated(nums):
lo, hi = 0, len(nums) - 1
while lo < hi:
mid = (lo + hi) // 2
if nums[mid] > nums[hi]: lo = mid + 1 # min in right half
else: hi = mid # mid could be min
return nums[lo]
# With duplicates โ worst case O(n):
def search_rotated_duplicates(nums, target):
lo, hi = 0, len(nums) - 1
while lo <= hi:
mid = (lo + hi) // 2
if nums[mid] == target: return True
# Can't determine sorted half when nums[lo] == nums[mid] == nums[hi]
if nums[lo] == nums[mid] == nums[hi]:
lo += 1; hi -= 1 # shrink both sides by 1 โ O(n) worst case
elif nums[lo] <= nums[mid]:
if nums[lo] <= target < nums[mid]: hi = mid - 1
else: lo = mid + 1
else:
if nums[mid] < target <= nums[hi]: lo = mid + 1
else: hi = mid - 1
return False
# K-sorted array: each element is at most k positions from its sorted position
# Use min-heap of size k+1: O(n log k)
import heapq
def sort_k_sorted(arr, k):
heap = arr[:k+1]
heapq.heapify(heap)
result = []
for i in range(k+1, len(arr)):
result.append(heapq.heappushpop(heap, arr[i]))
while heap: result.append(heapq.heappop(heap))
return result
# Custom comparator โ sort intervals by start, break ties by end descending
intervals = [[1,4],[2,3],[3,5],[2,7]]
intervals.sort(key=lambda x: (x[0], -x[1]))
# Comparator for non-transitive orderings: use functools.cmp_to_key
from functools import cmp_to_key
# Largest number from array: "9" + "90" > "90" + "9" โ 990 > 909
def largest_number(nums):
strs = list(map(str, nums))
def compare(a, b):
if a + b > b + a: return -1 # a should come first
elif a + b < b + a: return 1
return 0
strs.sort(key=cmp_to_key(compare))
result = ''.join(strs)
return '0' if result[0] == '0' else result
# Nearly sorted: Python's Timsort is O(n) for already-sorted and O(n log n) otherwise
# Insertion sort also O(n) for nearly sorted but O(nยฒ) worst case
# Shell sort: O(n logยฒn) average โ good for nearly sorted with variable gap sequences
When data exceeds RAM (e.g., sorting 1TB log files), external sorting reads/writes data in chunks from disk, minimizing expensive I/O operations.
External merge sort algorithm:
- Create sorted runs: Read as much data as fits in RAM, sort it in memory, write to disk as a "run". Repeat until all data is in sorted runs.
- Merge runs: Use a k-way merge (min-heap) to merge all sorted run files simultaneously, reading one block from each at a time. Write merged output to disk.
# K-way merge (in-memory simulation of external merge):
def kway_merge(sorted_arrays):
import heapq
heap = []
iterators = [iter(arr) for arr in sorted_arrays]
for i, it in enumerate(iterators):
val = next(it, None)
if val is not None: heapq.heappush(heap, (val, i))
result = []
while heap:
val, i = heapq.heappop(heap)
result.append(val)
next_val = next(iterators[i], None)
if next_val is not None: heapq.heappush(heap, (next_val, i))
return result
# Optimization โ replacement selection:
# Instead of sorting fixed chunks, use a heap to create longer initial runs
# Average run length = 2 * RAM size โ fewer merge passes needed
# I/O complexity (measures in "disk I/Os", not comparisons):
# External merge sort: O((N/B) log_{M/B}(N/B)) I/Os
# where N = data size, B = block size, M = memory size
# Parallel external sort: split data across machines (MapReduce)
# Map: each machine sorts its partition
# Reduce: merge-sort the sorted partitions
Dynamic Programming
12 questionsDynamic programming solves complex problems by breaking them into overlapping subproblems, solving each once, and storing results to avoid redundant computation.
Two required conditions:
- Optimal substructure: The optimal solution to the problem can be constructed from optimal solutions to its subproblems.
- Overlapping subproblems: The same subproblems are solved multiple times in a naive recursive approach. DP stores results (memoization/tabulation) to avoid recomputation.
Two implementation approaches:
# Top-down (memoization): recursion + cache
# Natural to write, only computes needed subproblems
from functools import lru_cache
@lru_cache(maxsize=None)
def fib_memo(n):
if n <= 1: return n
return fib_memo(n-1) + fib_memo(n-2)
# O(n) time, O(n) space (cache + call stack)
# Bottom-up (tabulation): iterative, fill table from base cases up
def fib_tab(n):
if n <= 1: return n
dp = [0] * (n + 1)
dp[1] = 1
for i in range(2, n + 1):
dp[i] = dp[i-1] + dp[i-2]
return dp[n]
# O(n) time, O(n) space
# Space-optimized: rolling variables
def fib_opt(n):
a, b = 0, 1
for _ in range(n): a, b = b, a + b
return a
# O(n) time, O(1) space
# DP problem identification signals:
# - "count ways to...", "minimum/maximum number of..."
# - Asks about sequences, subsets, or partitions
# - Choices at each step affect future options
# - Greedy doesn't work (local best โ global best)
# 0/1 Knapsack: items with weight[i] and value[i], capacity W
# Each item used 0 or 1 times. Maximize total value โค W
# dp[i][w] = max value using first i items with capacity w
def knapsack_2d(weights, values, W):
n = len(weights)
dp = [[0] * (W + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
for w in range(W + 1):
# Choice: skip item i, or take item i (if it fits)
dp[i][w] = dp[i-1][w]
if weights[i-1] <= w:
dp[i][w] = max(dp[i][w], dp[i-1][w - weights[i-1]] + values[i-1])
return dp[n][W]
# O(nยทW) time, O(nยทW) space
# Space optimization: only need previous row โ use 1D array
def knapsack_1d(weights, values, W):
dp = [0] * (W + 1)
for i in range(len(weights)):
# Iterate BACKWARDS to avoid using item i twice
for w in range(W, weights[i] - 1, -1):
dp[w] = max(dp[w], dp[w - weights[i]] + values[i])
return dp[W]
# O(nยทW) time, O(W) space
# Unbounded knapsack (can use each item unlimited times):
def unbounded_knapsack(weights, values, W):
dp = [0] * (W + 1)
for w in range(1, W + 1):
for i in range(len(weights)): # iterate FORWARD โ allows reuse
if weights[i] <= w:
dp[w] = max(dp[w], dp[w - weights[i]] + values[i])
return dp[W]
# Partition Equal Subset Sum โ is 0/1 knapsack with target = sum/2:
def can_partition(nums):
total = sum(nums)
if total % 2 != 0: return False
target = total // 2
dp = {0}
for n in nums:
dp |= {x + n for x in dp}
return target in dp
# LCS: longest subsequence common to both strings (need not be contiguous)
# dp[i][j] = LCS length of text1[:i] and text2[:j]
def lcs(text1, text2):
m, n = len(text1), len(text2)
dp = [[0]*(n+1) for _ in range(m+1)]
for i in range(1, m+1):
for j in range(1, n+1):
if text1[i-1] == text2[j-1]:
dp[i][j] = dp[i-1][j-1] + 1 # characters match: extend
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1]) # skip one
return dp[m][n]
# Space optimized LCS: O(min(m,n))
def lcs_opt(text1, text2):
if len(text1) < len(text2): text1, text2 = text2, text1
prev = [0] * (len(text2) + 1)
for c1 in text1:
curr = [0] * (len(text2) + 1)
for j, c2 in enumerate(text2, 1):
curr[j] = prev[j-1] + 1 if c1 == c2 else max(prev[j], curr[j-1])
prev = curr
return prev[-1]
# Longest Common Substring (must be CONTIGUOUS):
def longest_common_substr(text1, text2):
m, n = len(text1), len(text2)
dp = [[0]*(n+1) for _ in range(m+1)]
best = 0
for i in range(1, m+1):
for j in range(1, n+1):
if text1[i-1] == text2[j-1]:
dp[i][j] = dp[i-1][j-1] + 1 # ONLY extend โ no max with skip
best = max(best, dp[i][j])
# else dp[i][j] = 0 (default) โ contiguous constraint resets to 0
return best
# Applications of LCS:
# Edit distance (Levenshtein), diff tools, DNA sequence alignment
# LCS = n + m - 2*edit_distance (for insert/delete only operations)
# O(nยฒ) DP: dp[i] = LIS ending at index i
def lis_n2(nums):
n = len(nums)
dp = [1] * n
for i in range(1, n):
for j in range(i):
if nums[j] < nums[i]:
dp[i] = max(dp[i], dp[j] + 1)
return max(dp)
# O(n log n): patience sorting / binary search on tails array
# tails[i] = smallest tail element for increasing subsequence of length i+1
import bisect
def lis_nlogn(nums):
tails = []
for n in nums:
pos = bisect.bisect_left(tails, n)
if pos == len(tails): tails.append(n) # extend LIS
else: tails[pos] = n # replace to keep tails small
return len(tails)
# tails is always sorted! We can binary search it.
# NOTE: tails is NOT the actual LIS โ it's a helper structure
# Reconstruct the actual LIS:
def lis_reconstruct(nums):
tails = []; dp = [0] * len(nums); parent = [-1] * len(nums)
for i, n in enumerate(nums):
pos = bisect.bisect_left(tails, n)
dp[i] = pos
if pos == len(tails): tails.append(n)
else: tails[pos] = n
if pos > 0:
# Find the element that ends the LIS of length pos
for j in range(i-1, -1, -1):
if dp[j] == pos - 1 and nums[j] < n:
parent[i] = j; break
# Trace back from max dp position
end = dp.index(max(dp))
lis = []
while end != -1: lis.append(nums[end]); end = parent[end]
return lis[::-1]
# Edit distance: minimum insertions, deletions, substitutions to convert word1 โ word2
# dp[i][j] = min edits to convert word1[:i] to word2[:j]
def edit_distance(word1, word2):
m, n = len(word1), len(word2)
dp = [[0]*(n+1) for _ in range(m+1)]
# Base cases: convert to/from empty string
for i in range(m+1): dp[i][0] = i # delete i characters
for j in range(n+1): dp[0][j] = j # insert j characters
for i in range(1, m+1):
for j in range(1, n+1):
if word1[i-1] == word2[j-1]:
dp[i][j] = dp[i-1][j-1] # no edit needed
else:
dp[i][j] = 1 + min(
dp[i-1][j], # delete from word1
dp[i][j-1], # insert into word1
dp[i-1][j-1] # substitute
)
return dp[m][n]
# Space optimized: O(n) โ only need previous row
def edit_distance_opt(word1, word2):
m, n = len(word1), len(word2)
dp = list(range(n + 1))
for i in range(1, m + 1):
prev = dp[0]
dp[0] = i
for j in range(1, n + 1):
temp = dp[j]
if word1[i-1] == word2[j-1]:
dp[j] = prev
else:
dp[j] = 1 + min(prev, dp[j], dp[j-1])
prev = temp
return dp[n]
# Variations:
# One Edit Distance: check if exactly 1 edit apart โ O(n) without DP
# Wildcard matching ('?' and '*'): DP with special case for '*'
# Regex matching ('.' and '*'): DP with look-ahead for 'x*' patterns
Interval DP solves problems on contiguous subarrays or subsequences. dp[i][j] represents the answer for the subproblem on interval [i, j]. Fill in increasing interval lengths.
# Burst Balloons: max coins from bursting all balloons
# Key insight: think about the LAST balloon burst in [l, r]
def max_coins(nums):
nums = [1] + nums + [1] # add boundary balloons
n = len(nums)
dp = [[0]*n for _ in range(n)]
for length in range(2, n): # interval length
for left in range(0, n - length):
right = left + length
for k in range(left+1, right): # k = last balloon burst
dp[left][right] = max(dp[left][right],
nums[left]*nums[k]*nums[right] + dp[left][k] + dp[k][right])
return dp[0][n-1]
# Matrix Chain Multiplication: minimum multiplications to compute A1*A2*...*An
def matrix_chain(dims): # dims[i-1] x dims[i] = shape of matrix i
n = len(dims) - 1
dp = [[0]*n for _ in range(n)]
for length in range(2, n+1): # chain length
for i in range(n - length + 1):
j = i + length - 1
dp[i][j] = float('inf')
for k in range(i, j): # split point
cost = dp[i][k] + dp[k+1][j] + dims[i]*dims[k+1]*dims[j+1]
dp[i][j] = min(dp[i][j], cost)
return dp[0][n-1]
# Pattern: fill diagonally (increasing lengths)
# dp[i][j] depends on dp[i][k] and dp[k+1][j] for i <= k < j
# Minimum coins to make amount: O(n * amount)
def coin_change_min(coins, amount):
dp = [float('inf')] * (amount + 1)
dp[0] = 0
for a in range(1, amount + 1):
for c in coins:
if c <= a: dp[a] = min(dp[a], dp[a - c] + 1)
return dp[amount] if dp[amount] != float('inf') else -1
# Count ways to make amount (combination โ order doesn't matter):
def coin_change_ways(coins, amount):
dp = [0] * (amount + 1)
dp[0] = 1
for coin in coins: # outer loop on coins: each coin used any times
for a in range(coin, amount + 1):
dp[a] += dp[a - coin]
return dp[amount]
# Count ways (permutation โ order matters, e.g., step climbing):
def coin_change_perms(coins, amount):
dp = [0] * (amount + 1)
dp[0] = 1
for a in range(1, amount + 1): # outer loop on amount
for coin in coins:
if coin <= a: dp[a] += dp[a - coin]
return dp[amount]
# Why the loop order matters:
# Coins outer โ amount inner: each combination counted once (unordered)
# Amount outer โ coins inner: each permutation counted (ordered)
# Example: amount=4, coins=[1,2]
# Combinations: (2+2), (2+1+1), (1+1+1+1) = 3 ways
# Permutations: (1+1+2), (1+2+1), (2+1+1), (2+2), (1+1+1+1) = 5 ways
# DP on trees: compute optimal value for each subtree bottom-up via DFS
# State: for each node, compute result for "include node" and "exclude node"
# House Robber III: rob max without robbing parent+child simultaneously
def rob(root):
def dfs(node):
if not node: return (0, 0) # (rob_this_node, skip_this_node)
left_rob, left_skip = dfs(node.left)
right_rob, right_skip = dfs(node.right)
# Rob this node: can't rob children
rob_curr = node.val + left_skip + right_skip
# Skip this node: children can be robbed or skipped (take max)
skip_curr = max(left_rob, left_skip) + max(right_rob, right_skip)
return (rob_curr, skip_curr)
rob_root, skip_root = dfs(root)
return max(rob_root, skip_root)
# Max independent set on general tree:
# same pattern โ (include, exclude) pair propagated bottom-up
# Binary tree cameras โ minimum cameras to cover all nodes:
def min_camera_cover(root):
COVERED, CAMERA, UNCOVERED = 0, 1, 2
cameras = [0]
def dfs(node):
if not node: return COVERED
left, right = dfs(node.left), dfs(node.right)
if left == UNCOVERED or right == UNCOVERED:
cameras[0] += 1; return CAMERA
if left == CAMERA or right == CAMERA: return COVERED
return UNCOVERED # let parent handle
result = dfs(root)
return cameras[0] + (1 if result == UNCOVERED else 0)
# Tree DP pattern:
# 1. Define state for each node (often a tuple of values for include/exclude)
# 2. DFS returns state for each subtree
# 3. Combine children states at each node
# 4. Answer is the optimal state at root
# Digit DP: count numbers in [1, N] satisfying some digit property
# State: (position, tight_constraint, other_state...)
# tight = True: still bounded by N's digits; tight = False: free to use 0-9
# Count numbers in [1, n] with no two consecutive same digits
def count_no_consecutive_repeat(n):
digits = list(map(int, str(n)))
from functools import lru_cache
@lru_cache(maxsize=None)
def dp(pos, prev_digit, tight, started):
if pos == len(digits): return 1 if started else 0
limit = digits[pos] if tight else 9
result = 0
for d in range(0, limit + 1):
if d == 0 and not started: # leading zero
result += dp(pos+1, -1, tight and d==limit, False)
elif d != prev_digit: # no consecutive same
result += dp(pos+1, d, tight and d==limit, True)
return result
return dp(0, -1, True, False)
# Count numbers with digit sum = S in [lo, hi]:
# f(hi) - f(lo-1) using digit DP on upper bound
# General digit DP template:
# - State: (position, tight, extra_state)
# - tight=True: current prefix equals N's prefix โ limit = N[pos]
# - tight=False: previous digit was strictly less โ limit = 9
# - Base case: pos == len(digits) โ return 1 if valid, 0 if not
# - Transition: try each digit d from 0 to limit, recurse with updated state
# Bitmask DP: use bits to represent sets, enabling exponential state spaces
# State: (current_node, set_of_visited_nodes)
# dp[mask][i] = min cost to visit all nodes in mask, ending at node i
def tsp(cost): # cost[i][j] = edge weight, n <= 20
n = len(cost)
INF = float('inf')
# dp[mask][v] = min cost to visit nodes in mask ending at v
dp = [[INF]*n for _ in range(1 << n)]
dp[1][0] = 0 # start at node 0, mask = 0001 (only node 0 visited)
for mask in range(1 << n):
for u in range(n):
if dp[mask][u] == INF: continue
if not (mask >> u & 1): continue # u must be in mask
for v in range(n):
if mask >> v & 1: continue # v not yet visited
new_mask = mask | (1 << v)
dp[new_mask][v] = min(dp[new_mask][v], dp[mask][u] + cost[u][v])
full_mask = (1 << n) - 1
return min(dp[full_mask][v] + cost[v][0] for v in range(1, n))
# O(2^n * nยฒ) time, O(2^n * n) space โ feasible for n โค 20
# Minimum cost to assign n tasks to n workers: same structure
def min_cost_assignment(cost):
n = len(cost)
dp = [float('inf')] * (1 << n)
dp[0] = 0
for mask in range(1 << n):
person = bin(mask).count('1') # which person to assign next
if person >= n: continue
for task in range(n):
if not (mask >> task & 1):
dp[mask | (1 << task)] = min(dp[mask | (1 << task)],
dp[mask] + cost[person][task])
return dp[(1 << n) - 1]
# Palindrome Partitioning II: min cuts to partition s into palindromes
def min_cut(s):
n = len(s)
# First: precompute is_palindrome[i][j]: O(nยฒ)
pal = [[False]*n for _ in range(n)]
for i in range(n-1, -1, -1):
for j in range(i, n):
pal[i][j] = (s[i] == s[j]) and (j-i <= 2 or pal[i+1][j-1])
# dp[i] = min cuts for s[:i+1]
dp = list(range(-1, n)) # dp[0]=-1, dp[1..n] initialized to i-1 (all cuts)
for i in range(n):
for j in range(i+1):
if pal[j][i]:
dp[i+1] = min(dp[i+1], dp[j] + 1)
return dp[n]
# Wildcard Matching: '?' matches any char, '*' matches any sequence
def is_match_wildcard(s, p):
m, n = len(s), len(p)
dp = [[False]*(n+1) for _ in range(m+1)]
dp[0][0] = True
for j in range(1, n+1):
if p[j-1] == '*': dp[0][j] = dp[0][j-1] # * matches empty
for i in range(1, m+1):
for j in range(1, n+1):
if p[j-1] == '*':
dp[i][j] = dp[i-1][j] or dp[i][j-1] # * matches one more, or empty
elif p[j-1] == '?' or s[i-1] == p[j-1]:
dp[i][j] = dp[i-1][j-1]
return dp[m][n]
# Regular Expression Matching: '.' matches any char, '*' matches zero or more of preceding
def is_match_regex(s, p):
m, n = len(s), len(p)
dp = [[False]*(n+1) for _ in range(m+1)]
dp[0][0] = True
for j in range(2, n+1):
if p[j-1] == '*': dp[0][j] = dp[0][j-2]
for i in range(1, m+1):
for j in range(1, n+1):
if p[j-1] == '*':
dp[i][j] = dp[i][j-2] # x* = zero occurrences
if p[j-2] == '.' or p[j-2] == s[i-1]:
dp[i][j] = dp[i][j] or dp[i-1][j] # extend match
elif p[j-1] == '.' or p[j-1] == s[i-1]:
dp[i][j] = dp[i-1][j-1]
return dp[m][n]
# Unique Paths: O(m*n), O(n) space
def unique_paths(m, n):
dp = [1] * n
for _ in range(1, m):
for j in range(1, n): dp[j] += dp[j-1]
return dp[-1]
# Math: C(m+n-2, m-1) โ choose m-1 down moves from m+n-2 total moves
# Minimum Path Sum (down or right only): O(m*n), O(n) space
def min_path_sum(grid):
m, n = len(grid), len(grid[0])
dp = grid[0][:]
for j in range(1, n): dp[j] += dp[j-1] # first row
for i in range(1, m):
dp[0] += grid[i][0] # first column
for j in range(1, n):
dp[j] = grid[i][j] + min(dp[j], dp[j-1])
return dp[-1]
# Dungeon Game: knight must keep health > 0 at all times
# Fill DP BACKWARDS from bottom-right (future costs affect current requirements)
def calculate_minimum_hp(dungeon):
m, n = len(dungeon), len(dungeon[0])
dp = [[float('inf')]*(n+1) for _ in range(m+1)]
dp[m][n-1] = dp[m-1][n] = 1 # sentinels at end
for i in range(m-1, -1, -1):
for j in range(n-1, -1, -1):
min_health_ahead = min(dp[i+1][j], dp[i][j+1])
dp[i][j] = max(1, min_health_ahead - dungeon[i][j])
return dp[0][0]
# Why backwards? To know the minimum health needed TO ENTER cell (i,j),
# we need to know the minimum health needed AFTER leaving (i,j).
Greedy & Divide and Conquer
8 questionsA greedy algorithm makes the locally optimal choice at each step, hoping it leads to a globally optimal solution. It works when the problem has the greedy choice property โ a locally optimal choice is always part of some globally optimal solution.
Two proof strategies:
Exchange argument (most common): Assume an optimal solution that doesn't use the greedy choice. Show you can swap the greedy choice in without making things worse. This proves greedy โฅ optimal, so greedy = optimal.
Greedy stays ahead: Show that at each step, the greedy solution is at least as good as any other solution at that point. By induction, greedy is globally optimal.
# When greedy works:
# - Interval scheduling (earliest deadline first)
# - Fractional knapsack (highest value/weight ratio first)
# - Huffman coding
# - Prim's and Kruskal's MST
# - Dijkstra's shortest path (non-negative weights)
# When greedy FAILS (use DP instead):
# 0/1 Knapsack: can't take fractions, so highest ratio may not be globally optimal
# Coin change: coins=[1,3,4], amount=6 โ greedy gives 4+1+1=3 coins, optimal is 3+3=2
# Matrix chain multiplication
# Greedy vs DP decision framework:
# Can you show exchange argument holds? โ Greedy
# Does optimal solution of smaller problem directly give optimal for larger? โ DP
# When in doubt: try greedy, then find a counterexample, then switch to DP
# Activity Selection: max non-overlapping intervals
# Greedy: sort by END time, always pick earliest ending non-overlapping
def activity_selection(intervals):
intervals.sort(key=lambda x: x[1]) # sort by end
count = 0; end = float('-inf')
for s, e in intervals:
if s >= end: # doesn't overlap with last selected
count += 1; end = e
return count
# Merge Overlapping Intervals: O(n log n)
def merge_intervals(intervals):
intervals.sort(key=lambda x: x[0])
merged = [intervals[0]]
for s, e in intervals[1:]:
if s <= merged[-1][1]: # overlaps
merged[-1][1] = max(merged[-1][1], e)
else:
merged.append([s, e])
return merged
# Insert Interval into sorted non-overlapping list: O(n)
def insert_interval(intervals, new):
result = []
i, n = 0, len(intervals)
while i < n and intervals[i][1] < new[0]: # before new
result.append(intervals[i]); i += 1
while i < n and intervals[i][0] <= new[1]: # overlapping
new[0] = min(new[0], intervals[i][0])
new[1] = max(new[1], intervals[i][1])
i += 1
result.append(new)
result.extend(intervals[i:])
return result
# Minimum number of arrows to burst all balloons (= min non-overlapping cover):
def find_min_arrows(points):
points.sort(key=lambda x: x[1])
arrows = 1; end = points[0][1]
for s, e in points[1:]:
if s > end: arrows += 1; end = e
return arrows
# Huffman coding: assign shorter codes to more frequent characters
# Optimal prefix-free code โ no code is prefix of another โ unambiguous decoding
import heapq
def build_huffman(char_freq):
heap = [[freq, [char, ""]] for char, freq in char_freq.items()]
heapq.heapify(heap)
while len(heap) > 1:
lo = heapq.heappop(heap) # two lowest frequency nodes
hi = heapq.heappop(heap)
for pair in lo[1:]: pair[1] = '0' + pair[1] # left = 0
for pair in hi[1:]: pair[1] = '1' + pair[1] # right = 1
heapq.heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
return sorted(heapq.heappop(heap)[1:], key=lambda p: len(p[1]))
# Example: {'a': 5, 'b': 9, 'c': 12, 'd': 13, 'e': 16, 'f': 45}
# f=0, c=100, d=101, a=1100, b=1101, e=111
# Expected bits: 5*4 + 9*4 + 12*3 + 13*3 + 16*3 + 45*1 = 224
# Fixed 3-bit encoding: 45*3 = 135... wait, Huffman saves space!
# Total chars * avg_code_length vs total chars * log2(alphabet_size)
# Why greedy works: Exchange argument
# The two least frequent symbols can always be siblings in an optimal tree
# (any tree where they're not can be improved by swapping them to be siblings)
# This allows us to greedily merge the two smallest at each step
# Jump Game I: can you reach the last index?
# Greedy: track farthest reachable position
def can_jump(nums):
farthest = 0
for i, jump in enumerate(nums):
if i > farthest: return False # can't reach position i
farthest = max(farthest, i + jump)
return True
# Jump Game II: minimum jumps to reach last index
def jump(nums):
jumps = curr_end = farthest = 0
for i in range(len(nums) - 1):
farthest = max(farthest, i + nums[i])
if i == curr_end: # must jump from current window
jumps += 1
curr_end = farthest # extend to new reachable boundary
return jumps
# Jump Game III: from index i, can reach iยฑjump[i]. Can reach 0?
def can_reach(arr, start):
from collections import deque
q = deque([start]); visited = set()
while q:
i = q.popleft()
if arr[i] == 0: return True
if i in visited: continue
visited.add(i)
if i + arr[i] < len(arr): q.append(i + arr[i])
if i - arr[i] >= 0: q.append(i - arr[i])
return False
# Jump Game VI: max score path with window k (use deque DP): O(n)
def max_result(nums, k):
from collections import deque
dq = deque([0]) # indices, decreasing dp values
dp = [0] * len(nums); dp[0] = nums[0]
for i in range(1, len(nums)):
while dq and dq[0] < i - k: dq.popleft() # outside window
dp[i] = dp[dq[0]] + nums[i]
while dq and dp[dq[-1]] <= dp[i]: dq.pop()
dq.append(i)
return dp[-1]
Divide and conquer splits the problem into independent subproblems, solves them recursively, and combines results. Unlike DP, subproblems don't overlap.
# Template:
def divide_and_conquer(problem):
if base_case(problem): return base_solution
subproblems = divide(problem)
solutions = [divide_and_conquer(sp) for sp in subproblems]
return combine(solutions)
# Count inversions: pairs (i,j) where iarr[j] โ O(n log n)
def count_inversions(arr):
if len(arr) <= 1: return arr, 0
mid = len(arr) // 2
left, left_inv = count_inversions(arr[:mid])
right, right_inv = count_inversions(arr[mid:])
merged, split_inv = merge_count(left, right)
return merged, left_inv + right_inv + split_inv
def merge_count(left, right):
result = []; inversions = 0; i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]: result.append(left[i]); i += 1
else:
inversions += len(left) - i # all remaining left elements > right[j]
result.append(right[j]); j += 1
result.extend(left[i:]); result.extend(right[j:])
return result, inversions
# Closest pair of points: O(n log n) via divide and conquer
# Split by median x; recurse on each half; merge: check strip of width 2*delta
# Classic D&C problems:
# Binary search, merge sort, quicksort, fast matrix multiply (Strassen)
# FFT (polynomial multiplication), Karatsuba multiplication
# Closest pair, maximum subarray (but Kadane's is simpler)
# Gas Station: find starting position to complete the circuit
def can_complete_circuit(gas, cost):
total = curr = start = 0
for i in range(len(gas)):
diff = gas[i] - cost[i]
total += diff
curr += diff
if curr < 0: # can't start before or at i
start = i + 1
curr = 0
return start if total >= 0 else -1
# Key insight: if total gas >= total cost, a solution exists, and it's unique
# The greedy starting point is correct because any segment with negative running
# sum must be skipped โ the start must come after any such segment
# Task Scheduler: min intervals with cooldown n
from collections import Counter
def least_interval(tasks, n):
freq = sorted(Counter(tasks).values(), reverse=True)
max_freq = freq[0]
max_count = freq.count(max_freq) # how many tasks have max frequency
# Slots needed if filling optimally: (max_freq - 1) * (n+1) + max_count
return max(len(tasks), (max_freq - 1) * (n + 1) + max_count)
# Candy: each child gets โฅ1 candy; higher rating than neighbor โ more candy
def candy(ratings):
n = len(ratings)
candies = [1] * n
# Left to right: if higher than left neighbor, get one more
for i in range(1, n):
if ratings[i] > ratings[i-1]: candies[i] = candies[i-1] + 1
# Right to left: if higher than right neighbor, ensure more than right
for i in range(n-2, -1, -1):
if ratings[i] > ratings[i+1]:
candies[i] = max(candies[i], candies[i+1] + 1)
return sum(candies)
Backtracking is a systematic way to explore all possible solutions by building candidates incrementally and abandoning candidates ("backtracking") as soon as they're determined to violate constraints. It's smarter than pure brute force but still exponential in the worst case.
# General backtracking template:
def backtrack(state, choices, result):
if is_solution(state):
result.append(copy_of(state))
return
for choice in choices:
if is_valid(state, choice):
make_choice(state, choice) # choose
backtrack(state, next_choices, result) # explore
undo_choice(state, choice) # un-choose (backtrack)
# N-Queens: place N queens so none attack each other
def solve_n_queens(n):
result = []; cols = set(); pos_diag = set(); neg_diag = set()
board = [['.']*n for _ in range(n)]
def backtrack(row):
if row == n: result.append([''.join(r) for r in board]); return
for col in range(n):
if col in cols or (row+col) in pos_diag or (row-col) in neg_diag: continue
cols.add(col); pos_diag.add(row+col); neg_diag.add(row-col)
board[row][col] = 'Q'
backtrack(row + 1)
cols.discard(col); pos_diag.discard(row+col); neg_diag.discard(row-col)
board[row][col] = '.'
backtrack(0); return result
# Sudoku solver:
def solve_sudoku(board):
def is_valid(r, c, num):
box_r, box_c = 3*(r//3), 3*(c//3)
for i in range(9):
if board[r][i] == num or board[i][c] == num: return False
if board[box_r + i//3][box_c + i%3] == num: return False
return True
def solve():
for r in range(9):
for c in range(9):
if board[r][c] == '.':
for num in '123456789':
if is_valid(r, c, num):
board[r][c] = num
if solve(): return True
board[r][c] = '.'
return False
return True
solve()
# All subsets (power set): O(2โฟ) โ include or exclude each element
def subsets(nums):
result = []
def backtrack(start, path):
result.append(path[:]) # add current subset (including empty)
for i in range(start, len(nums)):
path.append(nums[i])
backtrack(i + 1, path)
path.pop()
backtrack(0, []); return result
# Subsets with duplicates โ skip duplicates at same level:
def subsets_with_dups(nums):
nums.sort(); result = []
def backtrack(start, path):
result.append(path[:])
for i in range(start, len(nums)):
if i > start and nums[i] == nums[i-1]: continue # skip dup
path.append(nums[i]); backtrack(i+1, path); path.pop()
backtrack(0, []); return result
# Combinations of size k from [1..n]:
def combine(n, k):
result = []
def backtrack(start, path):
if len(path) == k: result.append(path[:]); return
remaining = k - len(path)
for i in range(start, n - remaining + 2): # pruning
path.append(i); backtrack(i+1, path); path.pop()
backtrack(1, []); return result
# Combination Sum (elements can repeat, sum = target):
def combination_sum(candidates, target):
candidates.sort(); result = []
def backtrack(start, path, remaining):
if remaining == 0: result.append(path[:]); return
for i in range(start, len(candidates)):
if candidates[i] > remaining: break # pruning!
path.append(candidates[i])
backtrack(i, path, remaining - candidates[i]) # i not i+1: allows reuse
path.pop()
backtrack(0, [], target); return result
Advanced Data Structures
5 questionsA segment tree stores aggregate values (sum, min, max, GCD) for contiguous ranges. It supports point updates and range queries in O(log n), vs O(n) naive for range queries.
# Segment tree for range sum and point update: O(n) build, O(log n) query/update
class SegmentTree:
def __init__(self, nums):
self.n = len(nums)
self.tree = [0] * (4 * self.n)
self._build(nums, 0, 0, self.n - 1)
def _build(self, nums, node, start, end):
if start == end:
self.tree[node] = nums[start]
else:
mid = (start + end) // 2
self._build(nums, 2*node+1, start, mid)
self._build(nums, 2*node+2, mid+1, end)
self.tree[node] = self.tree[2*node+1] + self.tree[2*node+2]
def update(self, idx, val, node=0, start=0, end=None):
if end is None: end = self.n - 1
if start == end:
self.tree[node] = val; return
mid = (start + end) // 2
if idx <= mid: self.update(idx, val, 2*node+1, start, mid)
else: self.update(idx, val, 2*node+2, mid+1, end)
self.tree[node] = self.tree[2*node+1] + self.tree[2*node+2]
def query(self, l, r, node=0, start=0, end=None):
if end is None: end = self.n - 1
if r < start or end < l: return 0 # out of range
if l <= start and end <= r: return self.tree[node] # fully in range
mid = (start + end) // 2
return (self.query(l, r, 2*node+1, start, mid) +
self.query(l, r, 2*node+2, mid+1, end))
# Lazy propagation: defer range updates for O(log n) range updates (not just point)
# Fenwick Tree (BIT): O(n) build, O(log n) update and prefix query
# Simpler and smaller than segment tree; only for prefix queries (not arbitrary range)
class FenwickTree:
def __init__(self, n):
self.n = n
self.tree = [0] * (n + 1) # 1-indexed
def update(self, i, delta): # add delta to position i
while i <= self.n:
self.tree[i] += delta
i += i & (-i) # move to parent: add lowest set bit
def query(self, i): # prefix sum [1, i]
total = 0
while i > 0:
total += self.tree[i]
i -= i & (-i) # move to predecessor: remove lowest set bit
return total
def range_query(self, l, r): # sum [l, r]
return self.query(r) - self.query(l - 1)
# Build from array: O(n log n)
def build(arr):
ft = FenwickTree(len(arr))
for i, x in enumerate(arr, 1): ft.update(i, x)
return ft
# Or O(n): directly fill tree array
# Count smaller numbers after self (Fenwick + coordinate compression):
def count_smaller(nums):
sorted_unique = sorted(set(nums))
rank = {v: i+1 for i, v in enumerate(sorted_unique)}
ft = FenwickTree(len(sorted_unique))
result = []
for n in reversed(nums):
result.append(ft.query(rank[n] - 1)) # count elements < n
ft.update(rank[n], 1)
return result[::-1]
# Sparse Table: O(n log n) build, O(1) query for idempotent operations (min, max, GCD)
# Idempotent: f(f(a,b), b) = f(a,b) โ overlapping ranges give same answer
import math
class SparseTable:
def __init__(self, arr):
n = len(arr)
LOG = math.floor(math.log2(n)) + 1 if n > 0 else 1
# table[k][i] = min of arr[i..i+2^k-1]
self.table = [[float('inf')]*n for _ in range(LOG)]
self.log = [0] * (n + 1)
for i in range(2, n+1): self.log[i] = self.log[i//2] + 1
self.table[0] = arr[:]
k = 1
while (1 << k) <= n:
for i in range(n - (1 << k) + 1):
self.table[k][i] = min(self.table[k-1][i],
self.table[k-1][i + (1 << (k-1))])
k += 1
def query(self, l, r): # min of arr[l..r]: O(1)!
k = self.log[r - l + 1]
return min(self.table[k][l], self.table[k][r - (1 << k) + 1])
# Two overlapping intervals of length 2^k cover [l,r] completely
# Works for min/max (idempotent) โ NOT for sum (double-counting)
# Applications:
# Range Minimum Query (RMQ) in O(1) โ used in suffix arrays, LCA
# Answering many range queries on static arrays (no updates)
# For updates needed: use segment tree instead
Skip List: A probabilistic data structure with multiple levels of linked lists. Level 0 has all elements. Each higher level is a random subset of the level below (~50% probability). Expected O(log n) operations. Simpler to implement than balanced BSTs.
# Skip list โ conceptual structure:
# Level 2: 1 -----------> 9
# Level 1: 1 ---> 4 ----> 9
# Level 0: 1 -> 3 -> 4 -> 7 -> 9 -> 12
# Search: start at top level, move right if next โค target, else go down
# Treap (Tree + Heap): each node has a key (BST property) and a random priority
# (heap property). Random priorities give expected O(log n) height.
class TreapNode:
def __init__(self, key):
self.key = key
self.priority = random.random() # random priority
self.left = self.right = None
def treap_insert(root, key):
node = TreapNode(key)
# Split by key, merge with new node
left, right = split(root, key)
return merge(merge(left, node), right)
def split(node, key): # split into (=key)
if not node: return None, None
if node.key < key:
left, right = split(node.right, key)
node.right = left
return node, right
else:
left, right = split(node.left, key)
node.left = right
return left, node
def merge(left, right):
if not left or not right: return left or right
if left.priority > right.priority:
left.right = merge(left.right, right); return left
else:
right.left = merge(left, right.left); return right
Use cases: Skip lists are used in Redis sorted sets, LevelDB. Treaps used when you need split/merge operations on BSTs (competitive programming). In practice, use language-provided balanced BSTs (sortedcontainers.SortedList in Python, TreeMap in Java).
Consistent hashing is a technique for distributing data across a set of servers such that adding or removing a server minimizes the number of keys that need to be remapped.
# Naive hashing: key โ server_index = hash(key) % num_servers
# Problem: adding/removing a server remaps ~all keys (100% disruption)
# Consistent hashing: place both servers and keys on a hash ring [0, 2^32)
# Each key is assigned to the FIRST server clockwise on the ring
import hashlib, bisect
class ConsistentHash:
def __init__(self, virtual_nodes=150):
self.ring = {} # hash_position โ server
self.sorted_keys = [] # sorted hash positions
self.vn = virtual_nodes # multiple virtual nodes per server โ better balance
def _hash(self, key):
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def add_server(self, server):
for i in range(self.vn):
h = self._hash(f"{server}:{i}")
self.ring[h] = server
bisect.insort(self.sorted_keys, h)
def remove_server(self, server):
for i in range(self.vn):
h = self._hash(f"{server}:{i}")
del self.ring[h]
self.sorted_keys.remove(h)
def get_server(self, key):
if not self.ring: return None
h = self._hash(key)
idx = bisect.bisect_right(self.sorted_keys, h) % len(self.sorted_keys)
return self.ring[self.sorted_keys[idx]]
# When server added: only keys between the new server and its predecessor remap
# When server removed: only keys that mapped to it remap to its successor
# With n servers and k keys: ~k/n keys remap on each server change (vs k/2 naive)
# Used in: DynamoDB, Cassandra, Memcached, CDN edge selection, load balancers