Menu

Topic 24 of 30

Working With Lists and Dicts Explode Json Normalize

Learn Working With Lists and Dicts Explode Json Normalize for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Who this is for

Data Analysts who handle API responses, product catalogs, surveys, logs, or any dataset with nested lists/dicts that must be turned into clean, tabular rows and columns.

Prerequisites

  • Basic Python (lists, dicts, loops)
  • Core pandas (DataFrame, Series, indexing, groupby)
  • Comfort reading JSON-like structures

Why this matters

  • APIs return nested JSON; you need to flatten it for analysis.
  • Columns often contain arrays (tags, items, categories) that need row-wise expansion.
  • Survey questions can be multi-select (lists) and must be exploded to count responses.
  • Event logs store attributes as dicts; normalizing reveals columns you can aggregate.
Real tasks you’ll face
  • Explode a list of purchased items per order to compute item-level revenue.
  • Normalize a nested customer profile JSON into clean columns for modeling.
  • Unpack an array of event objects to analyze funnels and retention.

Concept explained simply

Two superpowers:

  • explode: Takes a column with list-like values and makes a new row for each element.
  • json_normalize: Takes nested dicts/lists and flattens them into columns (and optionally rows) using paths you specify.
Mental model

Think of explode as a photocopier for rows that have a list: it duplicates the row once per list item. Think of json_normalize as a map that traces paths through nested dicts (like customer.name or order.items[0].price) and pulls them out as tidy columns.

Core pandas tools you’ll use

  • Series.explode(ignore_index=True)
  • pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, errors='raise')
  • DataFrame.apply(pd.Series) for quick dict-to-columns (simple cases)
  • Series.str methods to clean lists/dicts stored as strings (careful, only when truly string-encoded)
Quick syntax reminders
df = pd.DataFrame({
    'order_id': [1, 2],
    'items': [['A', 'B'], ['C']]
})

# explode
out = df.explode('items', ignore_index=True)

# json_normalize
nested = [
    {
        'order_id': 1,
        'customer': {'id': 10, 'city': 'Berlin'},
        'items': [{'sku': 'A', 'qty': 2}, {'sku': 'B', 'qty': 1}]
    }
]
flat = pd.json_normalize(
    nested,
    record_path='items',
    meta=['order_id', ['customer', 'id'], ['customer', 'city']],
    meta_prefix='customer.'
)

Worked examples

Example 1 — Explode a list column for item-level analysis

import pandas as pd

df = pd.DataFrame({
    'order_id': [101, 102, 103],
    'items': [
        ['pen', 'notebook'],
        ['pen'],
        []
    ],
    'price': [5.0, 3.0, 0.0]
})

# Expand rows so each item is one row
items_long = df.explode('items', ignore_index=True)

# Count items
item_counts = items_long['items'].value_counts(dropna=True)
print(items_long)
print(item_counts)
What to notice
  • Empty lists become NaN rows after explode. Filter them if needed: items_long.dropna(subset=['items']).
  • ignore_index=True gives a clean 0..N index.

Example 2 — Flatten nested dicts with json_normalize

data = [
    {
        'user': {'id': 1, 'name': 'Ada', 'geo': {'city': 'Paris'}},
        'stats': {'logins': 5, 'purchases': 2}
    },
    {
        'user': {'id': 2, 'name': 'Lin', 'geo': {'city': 'Tokyo'}},
        'stats': {'logins': 8, 'purchases': 1}
    }
]

flat = pd.json_normalize(data)
print(flat)
# Columns: ['user.id','user.name','user.geo.city','stats.logins','stats.purchases']
Why this helps

You instantly get usable columns for grouping, joins, and visualization.

Example 3 — Normalize records inside a list (record_path + meta)

orders = [
  {
    'order_id': 'o1',
    'customer': {'id': 7, 'segment': 'B2C'},
    'lines': [
      {'sku': 'A', 'qty': 2, 'price': 3.5},
      {'sku': 'B', 'qty': 1, 'price': 5.0}
    ]
  },
  {
    'order_id': 'o2',
    'customer': {'id': 8, 'segment': 'B2B'},
    'lines': [{'sku': 'C', 'qty': 4, 'price': 2.0}]
  }
]

lines = pd.json_normalize(
    orders,
    record_path='lines',
    meta=['order_id', ['customer', 'id'], ['customer', 'segment']],
    meta_prefix='customer.'
)
print(lines)
# Each line becomes a row, customer/order fields are repeated for easy grouping.
Tip

Use meta_prefix to prevent name clashes and keep clarity.

Example 4 — Combine explode + json_normalize for mixed data

df = pd.DataFrame({
  'post_id': [1, 2],
  'comments': [
    [
      {'user': {'id': 11}, 'text': 'Nice', 'likes': 2},
      {'user': {'id': 12}, 'text': 'Wow', 'likes': 3}
    ],
    [
      {'user': {'id': 13}, 'text': 'Great', 'likes': 1}
    ]
  ]
})

# Step 1: explode comments to one comment per row
comments_long = df.explode('comments', ignore_index=True)

# Step 2: normalize each comment dict into columns
comment_cols = pd.json_normalize(comments_long['comments'])
result = pd.concat([comments_long['post_id'], comment_cols], axis=1)
print(result)
# Columns include user.id, text, likes
When to choose which approach
  • If your column already contains a list of dicts, explode first, then json_normalize the dicts.
  • If your entire dataset is a nested structure (list of dicts), go straight to json_normalize with record_path+meta.

Exercises (practice in your own notebook)

Mirror of the tasks in the Exercises section below. Use the checklist to verify.

Exercise 1 — Explode tags and count

You have a DataFrame of products with a list of tags:

import pandas as pd

df = pd.DataFrame({
  'sku': ['p1', 'p2', 'p3'],
  'tags': [['blue', 'cotton'], ['cotton'], []]
})

# Task:
# 1) Explode tags into rows.
# 2) Remove empty/NaN tags.
# 3) Count tags frequency, sorted descending.
  • [ ] Used explode with ignore_index=True
  • [ ] Dropped NaN tags after explode
  • [ ] Produced a Series with counts (index=tag, value=count)
Hint

After explode, use dropna on the exploded column and then value_counts().

Exercise 2 — Normalize nested payload and unnest events

Given:

data = pd.DataFrame({
  'payload': [
    {
      'user': {'id': 1, 'city': 'Paris'},
      'events': [{'type': 'view', 'value': 3}, {'type': 'buy', 'value': 1}]
    },
    {
      'user': {'id': 2, 'city': 'Rome'},
      'events': [{'type': 'view', 'value': 5}]
    }
  ]
})

# Task:
# 1) Extract rows for each event with json_normalize(record_path='events', meta=...).
# 2) Keep user.id and user.city as meta (prefix them with 'user.').
# 3) Result columns: type, value, user.id, user.city
  • [ ] Used pd.json_normalize with record_path='events'
  • [ ] Passed meta for user fields
  • [ ] Got tidy result with one row per event
Hint

First convert the column to a list: payloads = data['payload'].tolist(), then use json_normalize on it.

Common mistakes and self-checks

  • Forgetting to drop NaN after explode. Self-check: Do you have rows where the exploded column is NaN? Filter them.
  • Exploding a dict directly. Self-check: explode expects list-like; dicts will yield keys, often not what you expect.
  • Not preserving context when normalizing. Self-check: Use meta fields (like order_id, customer info) so you can aggregate later.
  • Name collisions after normalize. Self-check: Use meta_prefix and review columns for duplicates.
  • Assuming numeric types after flattening. Self-check: Inspect dtypes and use nullable types (e.g., Int64) if missing values exist.
Quick dtype fix patterns
# After normalization
cols = ['qty', 'price']
for c in cols:
    flat[c] = pd.to_numeric(flat[c], errors='coerce')
flat['qty'] = flat['qty'].astype('Int64')

Practical projects

  • E-commerce order lines: explode items from orders, compute item-level revenue, and top SKUs.
  • Marketing events: normalize nested event payloads and build a funnel conversion table.
  • Survey analysis: explode multi-select answers and chart most common choices by segment.

Learning path

  • Before this: Reading files (CSV/JSON), Selecting/Filtering, Basic GroupBy.
  • This subskill: explode, json_normalize, tidy nested structures.
  • Next: Merging/Joining datasets, Aggregations, and Time-based analytics.

Next steps

  • Refactor your data cleaning scripts to handle nested columns up-front.
  • Create a utility function that normalizes a payload and documents the meta fields used.
  • Practice on one real dataset with both explode and json_normalize.

Mini challenge

You receive a daily JSON dump of sessions where each session has user info and a list of actions with timestamps. Produce a table of one action per row including user_id, session_id, action_type, and minute-of-day. Then find top 3 actions by city.

Plan it
  1. json_normalize with record_path=actions, meta=[session_id, ['user','id'], ['user','city']].
  2. Convert timestamp to datetime and compute timestamp.dt.hour*60 + timestamp.dt.minute.
  3. Group by city and action_type; take top 3.

Quick test

The quick test below is available to everyone; log in to save your progress.

Practice Exercises

2 exercises to complete

Instructions

Given:

import pandas as pd

df = pd.DataFrame({
  'sku': ['p1', 'p2', 'p3'],
  'tags': [['blue', 'cotton'], ['cotton'], []]
})

# 1) Explode tags into rows (ignore_index=True)
# 2) Drop NaN tags
# 3) Compute tag counts descending
  • Return a Series with index as tag and values as counts.
Expected Output
cotton 2 blue 1 dtype: int64

Working With Lists and Dicts Explode Json Normalize — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Working With Lists and Dicts Explode Json Normalize?

AI Assistant

Ask questions about this tool