luvv to helpDiscover the Best Free Online Tools
Topic 21 of 31

Json and Semi Structured Data

Learn Json and Semi Structured Data for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 19, 2025 | Updated: December 19, 2025

Why this matters

Modern analytics data often arrives as JSON: web events, product attributes, API responses, and logs. As a Data Analyst, you will need to extract fields, filter by nested values, and convert arrays into rows for reporting.

  • Marketing: Parse campaign parameters from events JSON.
  • Product: Unnest item arrays to analyze top SKUs.
  • Support: Summarize error logs by nested error codes.

Concept explained simply

JSON is a flexible container with key–value pairs (objects) and ordered lists (arrays). SQL adds functions to read inside that container.

Mental model

Imagine a labeled shoebox (the row) with smaller labeled boxes (JSON object keys). You can open the box (extract), pick specific items (filter), and lay out all items from a bag inside (unnest arrays).

Common cross-database mapping (portable idea)
  • Select a value: use a JSON path function (e.g., shipping.city).
  • Filter by a value: apply a path function in WHERE.
  • Unnest arrays: use a table-producing function to expand elements to rows.
  • Cast types: convert text/JSON to integer, numeric, or date.

Core patterns you need

Step 1 — Extract a scalar value

-- Conceptual pattern (adjust to your SQL dialect)
SELECT order_id, extract_json_text(order_json, '$.shipping.city') AS city
FROM orders;
Engine hints
  • Postgres: order_json ->> 'shipping' not directly; use JSON path: order_json -> 'shipping' ->> 'city'
  • MySQL: JSON_UNQUOTE(JSON_EXTRACT(order_json, '$.shipping.city'))
  • BigQuery: JSON_VALUE(order_json, '$.shipping.city')
  • Snowflake: order_json:shipping:city::string
  • SQLite: json_extract(order_json, '$.shipping.city')

Step 2 — Filter by nested value

WHERE extract_json_text(order_json, '$.shipping.method') = 'air'

Step 3 — Unnest arrays to rows

-- Turn items[] array into one row per item
SELECT order_id, item.sku, item.qty, item.price
FROM orders, LATERAL unnest_json_array(order_json, '$.items') AS item;
Engine hints
  • Postgres: CROSS JOIN LATERAL jsonb_to_recordset(...)
  • MySQL: JSON_TABLE(...)
  • BigQuery: CROSS JOIN UNNEST(JSON_QUERY_ARRAY(...))
  • Snowflake: LATERAL FLATTEN(input => order_json:items)
  • SQLite: json_each(order_json, '$.items')

Step 4 — Cast to the right type

CAST(JSON_value AS NUMERIC) or ::numeric (engine-specific)

Worked examples

We will use a small orders table with a JSON column order_json.

Sample data (conceptual)
order_id | customer_id | order_json
-------- | ----------- | -------------------------------------
101      | 1           | {
           "shipping": {"city": "Austin", "method": "ground"},
           "totals": {"subtotal": 120.00, "tax": 9.60},
           "coupon": null,
           "items": [
             {"sku":"A1","qty":1,"price":60},
             {"sku":"B2","qty":2,"price":30}
           ]
         }
102      | 2           | {
           "shipping": {"city": "Boston", "method": "air"},
           "totals": {"subtotal": 50.00, "tax": 4.00},
           "coupon": "WELCOME10",
           "items": [
             {"sku":"A1","qty":1,"price":50}
           ]
         }
103      | 1           | {
           "shipping": {"city": "Austin", "method": "pickup"},
           "totals": {"subtotal": 80.00, "tax": 0.00},
           "items": [
             {"sku":"C3","qty":4,"price":20}
           ]
         }

Example 1: Extract shipping city as text

-- Portable idea
SELECT order_id,
       /* extract city as text */
       JSON_VALUE(order_json, '$.shipping.city') AS city
FROM orders;
Per engine snippets
  • Postgres: SELECT order_id, order_json -> 'shipping' ->> 'city' AS city FROM orders;
  • MySQL: SELECT order_id, JSON_UNQUOTE(JSON_EXTRACT(order_json, '$.shipping.city')) AS city FROM orders;
  • BigQuery: SELECT order_id, JSON_VALUE(order_json, '$.shipping.city') AS city FROM orders;
  • Snowflake: SELECT order_id, order_json:shipping:city::string AS city FROM orders;
  • SQLite: SELECT order_id, json_extract(order_json, '$.shipping.city') AS city FROM orders;

Example 2: Filter orders shipped by air

-- Portable idea
SELECT order_id
FROM orders
WHERE JSON_VALUE(order_json, '$.shipping.method') = 'air';

Example 3: Unnest items and compute line revenue

-- Portable idea: one row per item with computed revenue
SELECT o.order_id,
       i.sku,
       CAST(i.qty AS NUMERIC) * CAST(i.price AS NUMERIC) AS line_revenue
FROM orders o
JOIN LATERAL JSON_TABLE(o.order_json, '$.items[*]'
  COLUMNS (
    sku   VARCHAR(50) PATH '$.sku',
    qty   DECIMAL(10,2) PATH '$.qty',
    price DECIMAL(10,2) PATH '$.price'
  )
) AS i ON TRUE;  -- For engines without JSON_TABLE, see alternatives below
Alternatives for unnesting arrays
  • Postgres: CROSS JOIN LATERAL jsonb_to_recordset(order_json -> 'items') AS i(sku text, qty int, price numeric)
  • MySQL: JSON_TABLE(order_json, '$.items[*]' COLUMNS(...)) AS i
  • BigQuery: CROSS JOIN UNNEST(JSON_QUERY_ARRAY(order_json, '$.items')) AS item CROSS JOIN ( SELECT JSON_VALUE(item, '$.sku') AS sku, CAST(JSON_VALUE(item, '$.qty') AS NUMERIC) AS qty, CAST(JSON_VALUE(item, '$.price') AS NUMERIC) AS price ) i
  • Snowflake: LATERAL FLATTEN(input => order_json:items) f SELECT order_id, f.value:sku::string AS sku, f.value:qty::number AS qty, f.value:price::number AS price
  • SQLite: json_each(order_json, '$.items') AS e, then json_extract(e.value, '$.sku') etc.

Example 4: Aggregate by nested key (city)

SELECT JSON_VALUE(order_json, '$.shipping.city') AS city,
       COUNT(*) AS orders,
       SUM(CAST(JSON_VALUE(order_json, '$.totals.subtotal') AS NUMERIC)) AS subtotal_sum
FROM orders
GROUP BY city
ORDER BY orders DESC;

Practice: Exercises

Use the sample orders table above. Solve using your SQL engine of choice. Prefer native JSON functions, and cast types where needed.

Exercise 1 — Extract a nested field

Task: Return order_id and shipping city as plain text.

  • Output columns: order_id, city
  • Expected order: by order_id ascending
Hints
  • Use a JSON path to $.shipping.city
  • Ensure you output text, not a JSON object

Exercise 2 — Unnest and aggregate

Task: Unnest items and compute total quantity and revenue per sku.

  • Output columns: sku, total_qty, revenue
  • Sort by sku ascending
  • Revenue = SUM(qty * price)
Hints
  • Unnest items array to rows
  • Cast qty and price to numeric before multiplication
  • Checklist before you run:
    • Used engine-appropriate function for extraction
    • Handled arrays with UNNEST/JSON_TABLE/FLATTEN/json_each
    • Casted numbers for arithmetic
    • Sorted as requested

Common mistakes and self-check

  • Forgetting to unquote JSON strings: If you see quotes in results or NULLs in comparisons, use the text extraction function (e.g., ->> in Postgres, JSON_VALUE in BigQuery, JSON_UNQUOTE in MySQL).
  • Treating numbers as text: Cast before arithmetic; otherwise concatenation or errors may occur.
  • Incorrect JSON path: Start with $. for the root, use dot notation for nested keys, and [*] for arrays.
  • Dropping rows during joins: When unnesting, prefer CROSS JOIN/LATERAL; avoid INNER JOIN with conditions that filter out legitimate rows prematurely.
  • Confusing json vs jsonb (Postgres): Prefer jsonb for indexing and operators; functions differ slightly.
Self-check prompts
  • Can you extract a scalar, array element, and object field?
  • Can you turn an array into rows and aggregate?
  • Can you filter by a nested value with correct typing?

Who this is for

  • Data Analysts who work with event logs, product catalogs, or API feeds
  • SQL users moving beyond strictly tabular data

Prerequisites

  • Comfort with SELECT, WHERE, GROUP BY
  • Basic understanding of data types and casting

Learning path

  1. Read JSON basics (objects vs arrays)
  2. Practice scalar extraction
  3. Learn to filter by nested values
  4. Unnest arrays into rows
  5. Aggregate and join unnested data back to facts

Practical projects

  • Event analytics: Parse UTM parameters from event properties and build a campaign performance table by source/medium.
  • Catalog attributes: Explode product attributes JSON into a key–value table and find the most common color/size combinations.
  • Error monitoring: Extract error.code and error.message from logs and produce a weekly top-errors report.

Mini challenge

Create a query that returns, for each city, the top SKU by revenue using the sample orders data. Steps:

  • Unnest items
  • Compute revenue per order per sku
  • Aggregate per city and sku
  • Use a window function (ROW_NUMBER by city order by revenue desc) to pick the top

Next steps

  • Learn to index JSON fields (engine-specific) for performance
  • Standardize JSON parsing UDFs or views for your team
  • Move to more complex nesting and arrays-of-arrays

Quick Test is available to everyone. Log in to save your progress and resume later.

Practice Exercises

2 exercises to complete

Instructions

Return order_id and shipping city as text using the sample orders data. Sort by order_id ascending.

Engine tips
  • Postgres: order_json -> 'shipping' ->> 'city' AS city
  • MySQL: JSON_UNQUOTE(JSON_EXTRACT(order_json, '$.shipping.city')) AS city
  • BigQuery: JSON_VALUE(order_json, '$.shipping.city') AS city
  • Snowflake: order_json:shipping:city::string AS city
  • SQLite: json_extract(order_json, '$.shipping.city') AS city
Expected Output
order_id | city 101 | Austin 102 | Boston 103 | Austin

Json and Semi Structured Data — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Json and Semi Structured Data?

AI Assistant

Ask questions about this tool