luvv to helpDiscover the Best Free Online Tools
Topic 3 of 8

Data Fetching From APIs Or DB

Learn Data Fetching From APIs Or DB for free with explanations, exercises, and a quick test (for Data Visualization Engineer).

Published: December 28, 2025 | Updated: December 28, 2025

Why this matters

As a Data Visualization Engineer, you turn raw data into understandable visuals. To do that, you must reliably fetch data from APIs and databases, shape it into chart-friendly structures, and handle realities like authentication, pagination, rate limits, and data quality. Typical tasks include:

  • Pulling metrics from REST APIs (e.g., product, marketing, or telemetry data) for dashboards.
  • Querying databases to produce aggregated, chart-ready datasets.
  • Scheduling incremental fetches (e.g., yesterday only) to keep visuals fresh.
  • Handling errors and retries so charts don’t break when APIs hiccup.

Who this is for

  • Data Visualization Engineers and BI developers who prepare data for dashboards.
  • Analysts who want to automate chart data pipelines using Python or JavaScript.
  • Engineers moving from manual CSV workflows to API/DB-driven visuals.

Prerequisites

  • Basic Python or JavaScript syntax (variables, loops, functions).
  • Familiarity with JSON and simple SQL SELECT queries.
  • Comfort using environment variables for secrets (API keys, DB credentials).

Concept explained simply

Fetching data is like filling a clean water bottle:

  • Source = API or Database (the faucet).
  • Transport = HTTP request or DB connection (the pipe).
  • Filter/Shape = aggregation and selection (the filter).
  • Container = your chart dataset (the bottle).

Mental model: Source → Access → Select → Shape → Validate → Deliver.

Core patterns you will use

1. Access: Authenticate. For APIs, send headers (e.g., Authorization: Bearer token). For DBs, use connection strings from environment variables.
2. Select: Use query params (API) or SQL (DB) to fetch only the fields and date ranges you need.
3. Shape: Aggregate and group data into chart-ready structures (dimensions, measures, time).
4. Paginate: Loop through pages/cursors until all data is collected.
5. Protect: Handle rate limits, timeouts, retries, and data validation.
6. Deliver: Output arrays/objects the chart expects (labels + values, or series over time).

Worked examples (Python and JS)

Example 1 — Python: Fetch a REST API with pagination, safe retries, and chart shaping

Goal: Sum order amounts per day for January, ready for a line chart.

import os, time, requests
from datetime import date

API_URL = "https://api.example.com/orders"  # placeholder
TOKEN = os.getenv("API_TOKEN")
headers = {"Authorization": f"Bearer {TOKEN}", "Accept": "application/json"}
params = {"start_date": "2024-01-01", "end_date": "2024-01-31", "per_page": 200}

all_items = []
page = 1

while True:
    try:
        resp = requests.get(API_URL, headers=headers, params={**params, "page": page}, timeout=15)
        if resp.status_code == 429:
            # Rate-limited: exponential backoff respecting Retry-After when present
            retry_after = int(resp.headers.get("Retry-After", 2))
            time.sleep(retry_after)
            continue
        resp.raise_for_status()
        payload = resp.json()
        items = payload.get("data", [])
        all_items.extend(items)
        if not payload.get("next_page"):  # assume server returns null/false at end
            break
        page += 1
    except requests.RequestException as e:
        # Simple retry then give up
        time.sleep(2)
        resp = requests.get(API_URL, headers=headers, params={**params, "page": page}, timeout=15)
        if resp.ok:
            continue
        else:
            raise e

# Shape for chart: daily totals
from collections import defaultdict
by_day = defaultdict(float)
for row in all_items:
    # Example row: {"order_id": 1, "ordered_at": "2024-01-03", "amount": 39.5}
    by_day[row["ordered_at"]] += float(row["amount"]) 

series = sorted((d, by_day[d]) for d in by_day)  # [("2024-01-01", 123.4), ...]

print({"labels": [d for d, _ in series], "values": [v for _, v in series]})

Output fits many chart libs: labels (dates) and values (totals).

Example 2 — JavaScript: Fetch with cursor-based pagination and backoff

Goal: Get campaign clicks grouped by channel, ready for a bar chart.

// Node.js 18+ (fetch available). Browser usage similar (watch CORS).
const API_URL = "https://api.example.com/metrics"; // placeholder
const TOKEN = process.env.API_TOKEN;

async function wait(ms) { return new Promise(r => setTimeout(r, ms)); }

async function fetchAll() {
  let cursor = null;
  const all = [];
  while (true) {
    const url = new URL(API_URL);
    url.searchParams.set("metric", "clicks");
    if (cursor) url.searchParams.set("cursor", cursor);

    const resp = await fetch(url, {
      headers: { "Authorization": `Bearer ${TOKEN}`, "Accept": "application/json" },
    });

    if (resp.status === 429) {
      const ra = parseInt(resp.headers.get("Retry-After") || "2000", 10);
      await wait(ra);
      continue;
    }
    if (!resp.ok) throw new Error(`HTTP ${resp.status}`);

    const data = await resp.json();
    all.push(...data.data);
    cursor = data.next_cursor;
    if (!cursor) break;
    await wait(200); // polite throttling
  }
  return all;
}

(async () => {
  const rows = await fetchAll();
  // Example row: { channel: "email", clicks: 120 }
  const byChannel = rows.reduce((acc, r) => {
    acc[r.channel] = (acc[r.channel] || 0) + Number(r.clicks || 0);
    return acc;
  }, {});
  const labels = Object.keys(byChannel);
  const values = labels.map(k => byChannel[k]);
  console.log({ labels, values });
})();
Example 3 — Python + SQLite: Query and aggregate for chart-ready output

Goal: Top 3 product categories by revenue this week.

import sqlite3, datetime

# Demo setup (in-memory). Replace with file or production DB connection.
conn = sqlite3.connect(":memory:")
cur = conn.cursor()
cur.execute("CREATE TABLE orders (id INTEGER, category TEXT, amount REAL, ordered_at TEXT)")
cur.executemany("INSERT INTO orders VALUES (?,?,?,?)", [
    (1, "Books", 12.5, "2024-01-02"),
    (2, "Games", 55.0, "2024-01-03"),
    (3, "Books", 8.0, "2024-01-04"),
    (4, "Toys", 7.5, "2024-01-04"),
    (5, "Games", 10.0, "2024-01-05"),
])

# Parameterized query prevents SQL injection
start = "2024-01-01"
end = "2024-01-07"
cur.execute(
    """
    SELECT category, SUM(amount) AS revenue
    FROM orders
    WHERE ordered_at BETWEEN ? AND ?
    GROUP BY category
    ORDER BY revenue DESC
    LIMIT 3
    """,
    (start, end)
)
rows = cur.fetchall()
labels = [r[0] for r in rows]
values = [r[1] for r in rows]
print({"labels": labels, "values": values})

Authentication, pagination, and reliability

  • Authentication: Use environment variables for tokens/keys. Send via headers; never hard-code secrets.
  • Pagination: Use page/limit or cursor. Loop until no next page/cursor.
  • Rate limiting: Respect Retry-After. Use exponential backoff (e.g., 1s, 2s, 4s...).
  • Timeouts and retries: Set timeouts. Retry idempotent reads (GET); avoid unsafe retries on writes.
  • Caching: Cache stable reference data locally to reduce calls.

Data quality checks (fast self-check)

  • Row count sanity: Are you missing pages? Compare expected vs fetched.
  • Date coverage: Min/max dates match your range?
  • Duplicates: Deduplicate by a stable key (e.g., id).
  • Nulls: Replace or drop according to chart needs.
  • Schema drift: Did fields rename? Add assertions.

Exercises (hands-on)

Note: Everyone can do the exercises and the quick test. Only logged-in users get saved progress.

Exercise 1 — Combine paginated API results and compute averages (Python or JS)

You receive paginated product data. Each page is like:

{
  "data": [
    {"id": 1, "category": "Books", "price": 10.0},
    {"id": 2, "category": "Games", "price": 50.0}
  ],
  "next_page": 2
}

Pages 2 and 3 continue similarly. Task:

  • Fetch pages 1..3 (per_page=50). Deduplicate by id.
  • Compute average price per category.
  • Output sorted by category name ascending.

Checklist:

  • Loop over pages until next_page is null/false.
  • Use a set of seen ids to drop duplicates.
  • Keep sum and count per category, then compute averages.
Exercise 2 — Safe SQL: Aggregate with parameters and prepare for a bar chart

Create an in-memory SQLite database with a sales table and compute revenue by region for a date range using parameterized SQL. Output { labels, values } arrays sorted by revenue desc.

Checklist:

  • Create table, insert sample rows.
  • Use WHERE sale_date BETWEEN ? AND ? (parameters, not string concat).
  • GROUP BY region; ORDER BY SUM(amount) DESC.
  • Map rows to labels and values arrays.

Common mistakes and how to self-check

  • Hard-coding secrets: Always use environment variables. Self-check: Search code for tokens/keys.
  • Ignoring pagination: Compare fetched count vs expected. Add loop until no next page/cursor.
  • No timeouts: Add timeouts to avoid hanging requests.
  • Retrying unsafe writes: Only retry idempotent GETs, or use idempotency keys for POST if supported.
  • Pulling too much data: Aggregate at the source; fetch only needed columns and ranges.
  • Not validating schema: Assert required fields exist before shaping.

Practical projects

  • Daily KPI fetcher: Pull daily active users from an API, aggregate, and export a chart-ready JSON file.
  • Sales by region dashboard feed: Query a DB view to produce { labels, values } for a bar chart, refreshed hourly.
  • Alert on data freshness: Script checks the latest timestamp in the dataset and logs a warning if stale.

Learning path

  • Start: Basic HTTP requests or DB SELECT with parameters.
  • Then: Pagination, rate limits, retries, and incremental loads.
  • Next: Schema validation, deduplication, and chart data modeling.
  • Finally: Scheduling, caching, and monitoring freshness.

Next steps

  • Complete the exercises below and compare with the provided solutions.
  • Take the Quick Test to confirm you grasp the fundamentals.
  • Apply the patterns to your current dashboard data sources.

Mini challenge

Implement an incremental loader: Fetch only yesterday’s data from an API, merge with an existing local file of prior days, deduplicate by id, and output a chart-ready time series. Include a simple freshness check that warns if yesterday has zero rows.

Practice Exercises

2 exercises to complete

Instructions

You have a paginated API returning products. Simulate three pages using local variables. Each item: {id, category, price}. Tasks:

  • Iterate pages 1..3, combining items.
  • Deduplicate by id (last one wins or first; specify your choice).
  • Compute average price per category.
  • Output sorted by category ascending.

Sample pages (simplified):

page1 = {"data": [{"id":1,"category":"Books","price":10.0},{"id":2,"category":"Games","price":50.0}], "next_page": 2}
page2 = {"data": [{"id":3,"category":"Books","price":15.0},{"id":2,"category":"Games","price":55.0}], "next_page": 3}
page3 = {"data": [{"id":4,"category":"Toys","price":8.0}], "next_page": null}
Expected Output
[{"category":"Books","avg_price":12.5},{"category":"Games","avg_price":52.5},{"category":"Toys","avg_price":8.0}]

Data Fetching From APIs Or DB — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Data Fetching From APIs Or DB?

AI Assistant

Ask questions about this tool