Why this matters
Modern analytics data often arrives as JSON: web events, product attributes, API responses, and logs. As a Data Analyst, you will need to extract fields, filter by nested values, and convert arrays into rows for reporting.
- Marketing: Parse campaign parameters from events JSON.
- Product: Unnest item arrays to analyze top SKUs.
- Support: Summarize error logs by nested error codes.
Concept explained simply
JSON is a flexible container with key–value pairs (objects) and ordered lists (arrays). SQL adds functions to read inside that container.
Mental model
Imagine a labeled shoebox (the row) with smaller labeled boxes (JSON object keys). You can open the box (extract), pick specific items (filter), and lay out all items from a bag inside (unnest arrays).
Common cross-database mapping (portable idea)
- Select a value: use a JSON path function (e.g., shipping.city).
- Filter by a value: apply a path function in WHERE.
- Unnest arrays: use a table-producing function to expand elements to rows.
- Cast types: convert text/JSON to integer, numeric, or date.
Core patterns you need
Step 1 — Extract a scalar value
-- Conceptual pattern (adjust to your SQL dialect)
SELECT order_id, extract_json_text(order_json, '$.shipping.city') AS city
FROM orders;
Engine hints
- Postgres: order_json ->> 'shipping' not directly; use JSON path: order_json -> 'shipping' ->> 'city'
- MySQL: JSON_UNQUOTE(JSON_EXTRACT(order_json, '$.shipping.city'))
- BigQuery: JSON_VALUE(order_json, '$.shipping.city')
- Snowflake: order_json:shipping:city::string
- SQLite: json_extract(order_json, '$.shipping.city')
Step 2 — Filter by nested value
WHERE extract_json_text(order_json, '$.shipping.method') = 'air'
Step 3 — Unnest arrays to rows
-- Turn items[] array into one row per item
SELECT order_id, item.sku, item.qty, item.price
FROM orders, LATERAL unnest_json_array(order_json, '$.items') AS item;
Engine hints
- Postgres: CROSS JOIN LATERAL jsonb_to_recordset(...)
- MySQL: JSON_TABLE(...)
- BigQuery: CROSS JOIN UNNEST(JSON_QUERY_ARRAY(...))
- Snowflake: LATERAL FLATTEN(input => order_json:items)
- SQLite: json_each(order_json, '$.items')
Step 4 — Cast to the right type
CAST(JSON_value AS NUMERIC) or ::numeric (engine-specific)
Worked examples
We will use a small orders table with a JSON column order_json.
Sample data (conceptual)
order_id | customer_id | order_json
-------- | ----------- | -------------------------------------
101 | 1 | {
"shipping": {"city": "Austin", "method": "ground"},
"totals": {"subtotal": 120.00, "tax": 9.60},
"coupon": null,
"items": [
{"sku":"A1","qty":1,"price":60},
{"sku":"B2","qty":2,"price":30}
]
}
102 | 2 | {
"shipping": {"city": "Boston", "method": "air"},
"totals": {"subtotal": 50.00, "tax": 4.00},
"coupon": "WELCOME10",
"items": [
{"sku":"A1","qty":1,"price":50}
]
}
103 | 1 | {
"shipping": {"city": "Austin", "method": "pickup"},
"totals": {"subtotal": 80.00, "tax": 0.00},
"items": [
{"sku":"C3","qty":4,"price":20}
]
}
Example 1: Extract shipping city as text
-- Portable idea
SELECT order_id,
/* extract city as text */
JSON_VALUE(order_json, '$.shipping.city') AS city
FROM orders;
Per engine snippets
- Postgres: SELECT order_id, order_json -> 'shipping' ->> 'city' AS city FROM orders;
- MySQL: SELECT order_id, JSON_UNQUOTE(JSON_EXTRACT(order_json, '$.shipping.city')) AS city FROM orders;
- BigQuery: SELECT order_id, JSON_VALUE(order_json, '$.shipping.city') AS city FROM orders;
- Snowflake: SELECT order_id, order_json:shipping:city::string AS city FROM orders;
- SQLite: SELECT order_id, json_extract(order_json, '$.shipping.city') AS city FROM orders;
Example 2: Filter orders shipped by air
-- Portable idea
SELECT order_id
FROM orders
WHERE JSON_VALUE(order_json, '$.shipping.method') = 'air';
Example 3: Unnest items and compute line revenue
-- Portable idea: one row per item with computed revenue
SELECT o.order_id,
i.sku,
CAST(i.qty AS NUMERIC) * CAST(i.price AS NUMERIC) AS line_revenue
FROM orders o
JOIN LATERAL JSON_TABLE(o.order_json, '$.items[*]'
COLUMNS (
sku VARCHAR(50) PATH '$.sku',
qty DECIMAL(10,2) PATH '$.qty',
price DECIMAL(10,2) PATH '$.price'
)
) AS i ON TRUE; -- For engines without JSON_TABLE, see alternatives below
Alternatives for unnesting arrays
- Postgres: CROSS JOIN LATERAL jsonb_to_recordset(order_json -> 'items') AS i(sku text, qty int, price numeric)
- MySQL: JSON_TABLE(order_json, '$.items[*]' COLUMNS(...)) AS i
- BigQuery: CROSS JOIN UNNEST(JSON_QUERY_ARRAY(order_json, '$.items')) AS item CROSS JOIN ( SELECT JSON_VALUE(item, '$.sku') AS sku, CAST(JSON_VALUE(item, '$.qty') AS NUMERIC) AS qty, CAST(JSON_VALUE(item, '$.price') AS NUMERIC) AS price ) i
- Snowflake: LATERAL FLATTEN(input => order_json:items) f SELECT order_id, f.value:sku::string AS sku, f.value:qty::number AS qty, f.value:price::number AS price
- SQLite: json_each(order_json, '$.items') AS e, then json_extract(e.value, '$.sku') etc.
Example 4: Aggregate by nested key (city)
SELECT JSON_VALUE(order_json, '$.shipping.city') AS city,
COUNT(*) AS orders,
SUM(CAST(JSON_VALUE(order_json, '$.totals.subtotal') AS NUMERIC)) AS subtotal_sum
FROM orders
GROUP BY city
ORDER BY orders DESC;
Practice: Exercises
Use the sample orders table above. Solve using your SQL engine of choice. Prefer native JSON functions, and cast types where needed.
Exercise 1 — Extract a nested field
Task: Return order_id and shipping city as plain text.
- Output columns: order_id, city
- Expected order: by order_id ascending
Hints
- Use a JSON path to $.shipping.city
- Ensure you output text, not a JSON object
Exercise 2 — Unnest and aggregate
Task: Unnest items and compute total quantity and revenue per sku.
- Output columns: sku, total_qty, revenue
- Sort by sku ascending
- Revenue = SUM(qty * price)
Hints
- Unnest items array to rows
- Cast qty and price to numeric before multiplication
- Checklist before you run:
- Used engine-appropriate function for extraction
- Handled arrays with UNNEST/JSON_TABLE/FLATTEN/json_each
- Casted numbers for arithmetic
- Sorted as requested
Common mistakes and self-check
- Forgetting to unquote JSON strings: If you see quotes in results or NULLs in comparisons, use the text extraction function (e.g., ->> in Postgres, JSON_VALUE in BigQuery, JSON_UNQUOTE in MySQL).
- Treating numbers as text: Cast before arithmetic; otherwise concatenation or errors may occur.
- Incorrect JSON path: Start with $. for the root, use dot notation for nested keys, and [*] for arrays.
- Dropping rows during joins: When unnesting, prefer CROSS JOIN/LATERAL; avoid INNER JOIN with conditions that filter out legitimate rows prematurely.
- Confusing json vs jsonb (Postgres): Prefer jsonb for indexing and operators; functions differ slightly.
Self-check prompts
- Can you extract a scalar, array element, and object field?
- Can you turn an array into rows and aggregate?
- Can you filter by a nested value with correct typing?
Who this is for
- Data Analysts who work with event logs, product catalogs, or API feeds
- SQL users moving beyond strictly tabular data
Prerequisites
- Comfort with SELECT, WHERE, GROUP BY
- Basic understanding of data types and casting
Learning path
- Read JSON basics (objects vs arrays)
- Practice scalar extraction
- Learn to filter by nested values
- Unnest arrays into rows
- Aggregate and join unnested data back to facts
Practical projects
- Event analytics: Parse UTM parameters from event properties and build a campaign performance table by source/medium.
- Catalog attributes: Explode product attributes JSON into a key–value table and find the most common color/size combinations.
- Error monitoring: Extract error.code and error.message from logs and produce a weekly top-errors report.
Mini challenge
Create a query that returns, for each city, the top SKU by revenue using the sample orders data. Steps:
- Unnest items
- Compute revenue per order per sku
- Aggregate per city and sku
- Use a window function (ROW_NUMBER by city order by revenue desc) to pick the top
Next steps
- Learn to index JSON fields (engine-specific) for performance
- Standardize JSON parsing UDFs or views for your team
- Move to more complex nesting and arrays-of-arrays
Quick Test is available to everyone. Log in to save your progress and resume later.