How to learn Merge Join Concat and Append Patterns for Python pandas in Data Analyst for free

Who this is for

Data Analysts who already use pandas basics and want confident, repeatable patterns for combining tables: merging on keys, joining on index, and stacking or aligning datasets.

Why this matters

Create a customer 360 by merging customers, orders, and support tickets.
Build weekly or monthly performance tables by concatenating CSV extracts.
Enrich features by joining lookup tables (e.g., region names, product categories).
Run anti-joins to find unmatched records for data quality checks.

Prerequisites

Python basics and pandas DataFrame operations (selecting columns, filtering, basic dtype handling).
Comfort with indices and column names.

Concept explained simply

Think of merging as matching rows based on keys (columns or index). Concat is stacking blocks either vertically (more rows) or horizontally (more columns). Join is a convenience method for index-based merges.

Mental model

Picture a Venn diagram of keys between two tables. The join type (inner/left/right/outer) decides which circles you include. Concat is like piling sheets of paper: row-wise adds more pages; column-wise adds more columns side by side.

Core patterns and syntax

Merge by columns (most common)

import pandas as pd

# Basic
df = pd.merge(left, right, on="key", how="left")  # inner | left | right | outer

# Different column names
pd.merge(left, right, left_on="lkey", right_on="rkey", how="inner")

# Disambiguate overlapping column names
pd.merge(a, b, on="id", how="left", suffixes=("_a", "_b"))

# Check relationship to avoid row explosions
pd.merge(a, b, on="id", how="left", validate="one_to_one")  # or "one_to_many", "many_to_one", "many_to_many"

# Inspect matches
pd.merge(a, b, on="id", how="left", indicator=True)

Join by index

# Set index, then join
lookup = lookup.set_index("key")
main = main.set_index("key")
main.join(lookup, how="left")  # inner | left | right | outer

# Or with merge using index
a.merge(b, left_index=True, right_index=True, how="inner")

Concat (stack or align)

# Row-wise: stack more rows of same schema
pd.concat([jan, feb, mar], axis=0, ignore_index=True)

# Column-wise: align on index to add new columns
pd.concat([metrics, segments], axis=1)

# Track origins with keys to build a MultiIndex
pd.concat([jan, feb], keys=["Jan", "Feb"], axis=0)

Note: DataFrame.append is deprecated in modern pandas. Use pd.concat([df1, df2], ignore_index=True).

Append (modern pattern)

Use concat instead of append:

# Instead of: df1.append(df2)
combined = pd.concat([df1, df2], ignore_index=True)

Worked examples

1) Customer to Orders (left merge with indicator)

import pandas as pd
customers = pd.DataFrame({
    "customer_id": [101, 102, 103, 104],
    "name": ["Ava", "Ben", "Cara", "Drew"]
})
orders = pd.DataFrame({
    "customer_id": ["101", "101", "103", "105"],  # string ids
    "order_id": [1, 2, 3, 4],
    "amount": [20, 35, 15, 50]
})
# Align dtypes for safe merge
orders["customer_id"] = orders["customer_id"].astype(int)

out = pd.merge(
    customers, orders,
    on="customer_id", how="left",
    indicator=True, suffixes=("", "_order")
).sort_values(["customer_id", "order_id"], na_position="last")
print(out)

Result includes all customers; unmatched show NaN and _merge='left_only'.

2) Join features by index (cleaner than re-specifying keys)

users = pd.DataFrame({"id": [1,2,3], "name": ["Ana","Bo","Cy"]}).set_index("id")
regions = pd.DataFrame({"id": [1,2,4], "region": ["NA","EU","APAC"]}).set_index("id")

# Left join: keep all users
features = users.join(regions, how="left")
print(features)

Users without a region stay in the result; region is NaN.

3) Concatenate monthly datasets (row-wise) and tag origin

jan = pd.DataFrame({"session_id":[1,2,3], "user":["A","B","C"], "events":[5,2,7]})
feb = pd.DataFrame({"session_id":[3,4], "user":["C","D"], "events":[4,1]})

# Add a month column and stack
jan2 = jan.assign(month="Jan")
feb2 = feb.assign(month="Feb")
all_sessions = pd.concat([jan2, feb2], axis=0, ignore_index=True)
print(all_sessions)

Now you can group by user or month, and sessions persist across months.

Practice exercises

Solve these inside your notebook. The same tasks appear below with a solution toggle.

Exercise 1 — Left merge customers and orders (with dtype fix)

Create the two DataFrames below and produce the expected output. Use a left merge on customer_id, align dtypes, include indicator, sort by customer_id then order_id.

import pandas as pd
customers = pd.DataFrame({
    "customer_id": [101, 102, 103, 104],
    "name": ["Ava", "Ben", "Cara", "Drew"]
})
orders = pd.DataFrame({
    "customer_id": ["101", "101", "103", "105"],
    "order_id": [1, 2, 3, 4],
    "amount": [20, 35, 15, 50]
})

Expected output:

customer_id  name  order_id  amount     _merge
101          Ava   1         20         both
101          Ava   2         35         both
102          Ben   NaN       NaN        left_only
103          Cara  3         15         both
104          Drew  NaN       NaN        left_only

See hints

Convert orders.customer_id to int before merging.
Use pd.merge with how='left', indicator=True.
Sort values by ['customer_id','order_id'].

Exercise 2 — Concat row-wise and tag month

Make these DataFrames and stack them row-wise. Add a month column to each before concatenation and ignore the index.

jan = pd.DataFrame({"session_id":[1,2,3], "user":["A","B","C"], "events":[5,2,7]})
feb = pd.DataFrame({"session_id":[3,4], "user":["C","D"], "events":[4,1]})

Expected output:

session_id  user  events  month
1           A     5       Jan
2           B     2       Jan
3           C     7       Jan
3           C     4       Feb
4           D     1       Feb

See hints

Use assign(month='Jan') and assign(month='Feb').
pd.concat([...], axis=0, ignore_index=True)

Checklist before you continue

I know when to use merge vs join vs concat.
I can safely prevent row explosions with validate=...
I can track match results with indicator=True.
I can replace append with pd.concat consistently.
I can align dtypes before merging.

Common mistakes and self-check

Using DataFrame.append: In modern pandas, use pd.concat instead.
Mismatched dtypes: Keys as int vs str cause zero matches. Self-check: customers.customer_id.dtype equals orders.customer_id.dtype.
Accidental many-to-many merges: Unintended row multiplication. Self-check: use validate to enforce expected cardinality.
Forgetting suffixes: Overlapping columns silently overwrite. Use suffixes=("_left","_right").
Column-wise concat without aligned index: Misaligned rows. Ensure indices match or reset/reindex explicitly.
Outer merges without handling NaN: Follow up with fillna or explicit checks.

Mini challenge

Sales enrichment pipeline:

Concat three months of transactions (Jan, Feb, Mar) row-wise and tag month.
Merge with a product lookup to add category and list_price.
Add a flag price_mismatch = (paid_price != list_price) and compute mismatch rate by month.
Guard with validate to avoid many-to-many merges.

Tip

Set product lookup to index by product_id and use left joins from transactions.

Practical projects

Marketing attribution table: concat daily campaign logs, then left-merge user attributes to analyze performance by segment.
Data quality dashboard: anti-join customer master to orders to list orphan orders or orphan customers.
Feature store prototype: join multiple lookups (region, plan, cohort) to a users table; ensure cardinalities with validate.

Learning path

After joins/concat: GroupBy and aggregation to summarize the combined data.
Then: Pivot tables and reshaping (melt/pivot) for reporting.
Finally: Data validation patterns (assertions, dtype checks) to productionize notebooks.

Next steps

Refactor your current project to replace append with concat and add validate to critical merges.
Create reusable helper functions for common joins (e.g., join_products(df)).
Document your merge keys, expected cardinalities, and post-merge row counts.

Quick Test

Take the quick test below. Available to everyone. Only logged-in users get saved progress.

Menu

Merge Join Concat and Append Patterns

Table of Contents

Who this is for

Why this matters

Prerequisites

Concept explained simply

Mental model

Core patterns and syntax

Worked examples

Practice exercises

Exercise 1 — Left merge customers and orders (with dtype fix)

Exercise 2 — Concat row-wise and tag month

Checklist before you continue

Common mistakes and self-check

Mini challenge

Practical projects

Learning path

Next steps

Quick Test

Practice Exercises

Left merge customers and orders (with dtype fix)

Instructions

Expected Output

Concat monthly sessions and tag month

Merge Join Concat and Append Patterns — Quick Test

Have questions about Merge Join Concat and Append Patterns?

AI Assistant