Why this matters
As a Data Analyst, you often combine results from multiple tables or time periods. Set operations let you quickly answer questions like: Which customers appeared in both campaigns? Which products were sold last year but not this year? How do I combine two lists without duplicates?
- Deduplicate combined lists from different sources
- Find overlaps between cohorts or periods
- Identify drop-off or new entries between snapshots
Who this is for
Data Analysts and learners who know basic SELECT queries and want to compare and combine result sets cleanly.
Prerequisites
- SELECT, WHERE, and ORDER BY basics
- Comfort with simple joins
- Understanding of column data types
Concept explained simply
Set operations work on the results of two or more SELECT statements. Think of each SELECT as producing a list of rows. Then:
- UNION: combine lists and remove duplicates
- UNION ALL: combine lists and keep duplicates
- INTERSECT: return only rows that appear in both lists
- EXCEPT (sometimes called MINUS): rows in the first list that do not appear in the second
Dialect notes
- EXCEPT may be called MINUS in some databases
- Some systems do not support INTERSECT/EXCEPT directly; you can emulate with joins (INNER JOIN for INTERSECT; LEFT JOIN with NULL filter for EXCEPT)
Mental model
Picture two circles of rows (lists) from two SELECTs:
- UNION: the whole area covered by both circles, deduplicated
- UNION ALL: the whole area, keeping duplicates
- INTERSECT: the overlapping area
- EXCEPT: the part of the first circle that does not overlap with the second
Syntax quick view
-- Union (distinct by default)
SELECT col1, col2 FROM table_a
UNION
SELECT col1, col2 FROM table_b
ORDER BY col1;
-- Union all (keeps duplicates)
SELECT col1, col2 FROM table_a
UNION ALL
SELECT col1, col2 FROM table_b;
-- Intersect
SELECT col1, col2 FROM table_a
INTERSECT
SELECT col1, col2 FROM table_b;
-- Except (or MINUS)
SELECT col1, col2 FROM table_a
EXCEPT
SELECT col1, col2 FROM table_b;
Core rules (worth memorizing)
- Each SELECT must return the same number of columns, in the same order
- Corresponding columns must have compatible data types
- Column names in the final result come from the first SELECT
- ORDER BY can appear only once, at the very end
- NULLs are considered equal when removing duplicates in set operations
- Use parentheses to control precedence when chaining multiple set operations
Worked examples
Example 1: Combine two months of active users
-- Tables: active_users_jan(user_id), active_users_feb(user_id)
-- Unique users across both months
SELECT user_id FROM active_users_jan
UNION
SELECT user_id FROM active_users_feb
ORDER BY user_id;
-- If you need to count total appearances (including duplicates)
SELECT user_id FROM active_users_jan
UNION ALL
SELECT user_id FROM active_users_feb;
UNION returns each user once; UNION ALL shows duplicate rows if a user appears in both months.
Example 2: Customers who purchased in both years
-- Tables: purchases_2023(customer_id), purchases_2024(customer_id)
-- Customers present in both sets
SELECT customer_id FROM purchases_2023
INTERSECT
SELECT customer_id FROM purchases_2024
ORDER BY customer_id;
-- Emulation if INTERSECT is unavailable
SELECT p23.customer_id
FROM (SELECT DISTINCT customer_id FROM purchases_2023) p23
INNER JOIN (SELECT DISTINCT customer_id FROM purchases_2024) p24
ON p23.customer_id = p24.customer_id
ORDER BY p23.customer_id;
INTERSECT (or the join) returns only customer IDs found in both years.
Example 3: Products discontinued this year
-- Tables: products_2023(product_sku), products_2024(product_sku)
-- In 2023 but not in 2024
SELECT product_sku FROM products_2023
EXCEPT
SELECT product_sku FROM products_2024
ORDER BY product_sku;
-- Emulation if EXCEPT is unavailable
SELECT p23.product_sku
FROM (SELECT DISTINCT product_sku FROM products_2023) p23
LEFT JOIN (SELECT DISTINCT product_sku FROM products_2024) p24
ON p23.product_sku = p24.product_sku
WHERE p24.product_sku IS NULL
ORDER BY p23.product_sku;
EXCEPT returns rows unique to the first result set.
Hands-on exercises
These mirror the exercises listed below and include solutions you can reveal. Try them first before opening the solutions.
Exercise 1: All events across two years
Tables:
events_2024(event_name)
Spring Expo
Data Day
Retail Summit
events_2025(event_name)
Health Forum
Data Day
Retail Summit
Task: Return a single column event_name containing every unique event across both years, sorted alphabetically.
- Use a set operation
- No duplicates
- One ORDER BY at the end
Exercise 2: Emails in both signup and purchase lists
Tables:
signups(email)
ana@ex.com
bo@ex.com
cy@ex.com
purchases(email)
bo@ex.com
di@ex.com
cy@ex.com
Task: Return the emails that are present in both tables, sorted alphabetically. If INTERSECT is unavailable, emulate it.
Self-check checklist
- Did both SELECTs return the same number of columns in the same order?
- Did you avoid placing ORDER BY inside subqueries of the set operation?
- Did you apply ORDER BY once at the very end?
- If emulating INTERSECT/EXCEPT, did you DISTINCT before joining to avoid accidental duplicates?
Common mistakes
- Different column counts or incompatible types between SELECTs: align columns and cast if needed
- Using ORDER BY in each SELECT: move ORDER BY to the final combined query
- Expecting UNION to keep duplicates: use UNION ALL if you need duplicates
- Forgetting that column names come from the first SELECT: alias columns in the first SELECT
- Unexpected duplicates with INTERSECT/EXCEPT emulations: use DISTINCT before joins
Quick self-debug steps
- Run each SELECT separately to confirm schemas and data types
- COUNT rows before and after to confirm deduplication behavior
- Test with tiny sample data to validate logic
Practical projects
- Cohort overlap dashboard: Given monthly user cohorts, use INTERSECT (or joins) to compute overlaps and trend them
- Catalog change report: Use EXCEPT to list added and dropped SKUs between snapshot tables
- Campaign reach merger: Use UNION/UNION ALL to combine multi-channel contact lists and deduplicate appropriately
Learning path
- Before: SELECT, WHERE, GROUP BY, basic joins
- This subskill: UNION, UNION ALL, INTERSECT, EXCEPT/MINUS, emulations with joins
- After: Subqueries, window functions, and CTEs for more complex analysis
Mini challenge
You have two tables, newsletter_may(email) and newsletter_june(email). Return three result sets:
- All unique emails across both months
- Emails present in both months
- Emails in May but not in June
Write three queries using set operations. Then re-write the last two using joins only.
Next steps
- Revisit your recent analyses and replace manual joins with set operations where appropriate
- Create small sample tables and verify each operation’s behavior with duplicates and NULLs
- Proceed to advanced querying with subqueries and CTEs
Quick Test
Take the quick test below to check your understanding. Everyone can take it for free; logged-in learners will have their progress saved automatically.