Why this matters
Backend systems move data. Choosing the right data structure and algorithm turns slow, memory-hungry services into fast, reliable ones. Real tasks include:
- De-duplicating millions of event IDs before writing to storage (hash set).
- Finding top K endpoints by traffic for dashboards (min-heap).
- Validating request payloads or parsers (stack for brackets, tree traversals).
- Resolving dependency order for migrations or job orchestration (graph + BFS/DFS).
Who this is for
Junior to mid-level backend engineers who want practical, day-to-day mastery of foundational structures and algorithms.
Prerequisites
- Comfort with a general-purpose language (e.g., Java, Go, Python, C#, JavaScript).
- Basic understanding of functions, loops, and arrays.
- Ability to read simple pseudocode.
Concept explained simply
Data structures store data efficiently for the operations you need. Algorithms are step-by-step methods to process that data. You pick structures and algorithms based on the operations you do most often.
Mental model
- Think of a toolbelt: each tool is great for a specific job. A hammer (hash map) is amazing for quick lookups, but not for keeping things sorted.
- Ask 3 questions: What operations are frequent? (insert, search, update) How big can the data get? What are the latency/memory limits?
Core data structures at a glance
- Array: contiguous items. Great for indexing by position and iteration. Insert/delete in middle is costly.
- Linked list: nodes point to next (and maybe prev). Fast insert/delete if you have the node; slow random access.
- Hash map / set: average O(1) insert/find/delete. Unordered. Great for de-duplication and caching.
- Stack: LIFO. Push/pop O(1). Use for parsing, undo, and balanced symbols.
- Queue / Deque: FIFO or double-ended. Use for task pipelines, rate limiting windows.
- Binary search tree (BST): keeps items sorted; balanced variants give O(log n) operations. In-order traversal yields sorted order.
- Heap (priority queue): quickly get min/max. Top-K queries and scheduling.
- Graph: nodes and edges. Use for dependencies, routing, and relationships.
Core algorithms you will use
- Big-O basics: O(1), O(log n), O(n), O(n log n), O(n^2). Know the common costs for your chosen structure.
- Binary search: requires sorted data. Find item or boundary in O(log n).
- Sorting: merge/quick/heap sorts. Many built-ins are O(n log n).
- Traversal: BFS (level-by-level, shortest paths in unweighted graphs), DFS (deep exploration, cycle detection).
- Two pointers / sliding window: linear-time scans for subarray/substring problems.
- Hashing: constant-time membership, frequency counts, grouping.
- Recursion/iteration patterns: tree/graph traversals, divide-and-conquer.
Worked examples
1) Fast duplicate suppression (hash set)
Scenario: Drop repeated event IDs before writing to the database.
- Keep a set of seen IDs.
- For each incoming ID, check membership in O(1) average time.
- If unseen, write to DB and add to set; otherwise, skip.
seen = set()
for id in stream:
if id not in seen:
write(id)
seen.add(id)Trade-off: memory grows with unique IDs. Consider TTL or a counting Bloom filter when memory is tight.
2) Top-K endpoints by traffic (min-heap)
Scenario: Report top 3 URLs by request count from a large list of (url, count).
- Maintain a min-heap of size K by count.
- Push each (count, url). If size > K, pop the smallest.
- Heap contains the top K after one pass in O(n log K).
heap = [] # min-heap of (count, url)
for (url, count) in stats:
heappush(heap, (count, url))
if len(heap) > K:
heappop(heap)
# heap now holds top K3) Validate JSON-like brackets (stack)
Scenario: Quickly reject malformed payloads "{[()]}" vs "{[(])}".
- Push opening brackets to a stack.
- On closing bracket, check if it matches the top of the stack.
- Valid if stack empty at the end and all matched.
pairs = {')':'(', ']':'[', '}':'{'}
stack = []
for ch in s:
if ch in "([{":
stack.append(ch)
elif ch in ")]}":
if not stack or stack.pop() != pairs[ch]:
return False
return not stack4) Shortest dependency chain (BFS)
Scenario: Find minimal steps from Service A to Service B through dependencies.
- Model services as graph nodes; edges mean "calls".
- Run BFS from A to find shortest path to B.
- Stop when B is reached; reconstruct path using parent pointers.
queue = [A]; parent = {A: None}; visited = {A}
while queue:
x = queue.pop(0)
if x == B: break
for y in graph[x]:
if y not in visited:
visited.add(y); parent[y]=x; queue.append(y)
# Reconstruct path from B using parentExercises
Mirror these tasks in a language you know. Focus on correctness first, then complexity.
- Exercise 1: Validate brackets using a stack. Return true if the string has balanced (), [], {}. Examples: "({[]})" → true, "([)]" → false.
- Exercise 2: Top-K endpoints using a min-heap. Given pairs (url, count) and K, return the K URLs with highest counts.
Checklist before you move on
- You can explain when to use a hash map vs a heap.
- You can state the average-time complexity of hash map operations.
- You can run binary search and name its precondition.
- You can choose between BFS and DFS for shortest paths.
- You implemented stack-based bracket validation.
- You implemented a min-heap Top-K solution.
Common mistakes and self-check
- Forgetting preconditions: binary search requires data sorted. Self-check: did you sort or guarantee sorted input?
- Using arrays for frequent mid-list inserts: consider linked lists or deques.
- Using a max-heap for Top-K smallest (or vice versa): ensure the heap direction matches your goal.
- Assuming hash maps keep order: they do not guarantee sorted order. If you need order, use a tree or sort results.
- Ignoring memory growth of sets/maps in streams: add TTLs, windowing, or approximate structures when needed.
Quick self-audit
- Can you justify your chosen structure in one sentence tied to operations?
- Can you estimate time complexity for your solution’s hot path?
- Can you name a fallback if memory becomes the bottleneck?
Practical projects
- Unique event filter: read a stream of IDs and write only first-seen IDs. Add an optional TTL to evict old entries.
- Top-K trending endpoints: maintain a rolling Top-K over the last hour using a min-heap and a windowed counter.
- Config validator: check balanced brackets and simple key/value constraints for a config file loader.
- Service dependency analyzer: load a dependency list and output the shortest call chain between two services.
Learning path
- Master hash maps/sets and arrays (membership, frequency counting).
- Add stacks/queues/deques (parsing, pipelines, windows).
- Learn heaps and Top-K patterns.
- Practice binary search and boundary finding on sorted arrays.
- Introduce graphs with BFS/DFS and cycle detection.
- Revisit sorting and stability; understand when built-ins suffice.
Next steps
- Re-implement the exercises from scratch without notes.
- Benchmark your Top-K with different K and input sizes.
- Extend bracket validation to include quotes and escape rules.
Mini challenge
Log dedup within a sliding window: Given a stream of (timestamp, id) sorted by time, accept id only if it hasn’t appeared in the last 10 minutes. Describe a solution with its time and space complexity. Hint: combine a queue (to evict old items) and a set (for membership).
Ready? Take the quick test
The quick test is available to everyone. Only logged-in users will have their progress saved.