How to learn Data Fetching From APIs Or DB for Scripting Python or JS in Data Visualization Engineer for free

Why this matters

As a Data Visualization Engineer, you turn raw data into understandable visuals. To do that, you must reliably fetch data from APIs and databases, shape it into chart-friendly structures, and handle realities like authentication, pagination, rate limits, and data quality. Typical tasks include:

Pulling metrics from REST APIs (e.g., product, marketing, or telemetry data) for dashboards.
Querying databases to produce aggregated, chart-ready datasets.
Scheduling incremental fetches (e.g., yesterday only) to keep visuals fresh.
Handling errors and retries so charts don’t break when APIs hiccup.

Who this is for

Data Visualization Engineers and BI developers who prepare data for dashboards.
Analysts who want to automate chart data pipelines using Python or JavaScript.
Engineers moving from manual CSV workflows to API/DB-driven visuals.

Prerequisites

Basic Python or JavaScript syntax (variables, loops, functions).
Familiarity with JSON and simple SQL SELECT queries.
Comfort using environment variables for secrets (API keys, DB credentials).

Concept explained simply

Fetching data is like filling a clean water bottle:

Source = API or Database (the faucet).
Transport = HTTP request or DB connection (the pipe).
Filter/Shape = aggregation and selection (the filter).
Container = your chart dataset (the bottle).

Mental model: Source → Access → Select → Shape → Validate → Deliver.

Core patterns you will use

1. Access: Authenticate. For APIs, send headers (e.g., Authorization: Bearer token). For DBs, use connection strings from environment variables.

2. Select: Use query params (API) or SQL (DB) to fetch only the fields and date ranges you need.

3. Shape: Aggregate and group data into chart-ready structures (dimensions, measures, time).

4. Paginate: Loop through pages/cursors until all data is collected.

5. Protect: Handle rate limits, timeouts, retries, and data validation.

6. Deliver: Output arrays/objects the chart expects (labels + values, or series over time).

Worked examples (Python and JS)

Example 1 — Python: Fetch a REST API with pagination, safe retries, and chart shaping

Goal: Sum order amounts per day for January, ready for a line chart.

import os, time, requests
from datetime import date

API_URL = "https://api.example.com/orders"  # placeholder
TOKEN = os.getenv("API_TOKEN")
headers = {"Authorization": f"Bearer {TOKEN}", "Accept": "application/json"}
params = {"start_date": "2024-01-01", "end_date": "2024-01-31", "per_page": 200}

all_items = []
page = 1

while True:
    try:
        resp = requests.get(API_URL, headers=headers, params={**params, "page": page}, timeout=15)
        if resp.status_code == 429:
            # Rate-limited: exponential backoff respecting Retry-After when present
            retry_after = int(resp.headers.get("Retry-After", 2))
            time.sleep(retry_after)
            continue
        resp.raise_for_status()
        payload = resp.json()
        items = payload.get("data", [])
        all_items.extend(items)
        if not payload.get("next_page"):  # assume server returns null/false at end
            break
        page += 1
    except requests.RequestException as e:
        # Simple retry then give up
        time.sleep(2)
        resp = requests.get(API_URL, headers=headers, params={**params, "page": page}, timeout=15)
        if resp.ok:
            continue
        else:
            raise e

# Shape for chart: daily totals
from collections import defaultdict
by_day = defaultdict(float)
for row in all_items:
    # Example row: {"order_id": 1, "ordered_at": "2024-01-03", "amount": 39.5}
    by_day[row["ordered_at"]] += float(row["amount"]) 

series = sorted((d, by_day[d]) for d in by_day)  # [("2024-01-01", 123.4), ...]

print({"labels": [d for d, _ in series], "values": [v for _, v in series]})

Output fits many chart libs: labels (dates) and values (totals).

Example 2 — JavaScript: Fetch with cursor-based pagination and backoff

Goal: Get campaign clicks grouped by channel, ready for a bar chart.

// Node.js 18+ (fetch available). Browser usage similar (watch CORS).
const API_URL = "https://api.example.com/metrics"; // placeholder
const TOKEN = process.env.API_TOKEN;

async function wait(ms) { return new Promise(r => setTimeout(r, ms)); }

async function fetchAll() {
  let cursor = null;
  const all = [];
  while (true) {
    const url = new URL(API_URL);
    url.searchParams.set("metric", "clicks");
    if (cursor) url.searchParams.set("cursor", cursor);

    const resp = await fetch(url, {
      headers: { "Authorization": `Bearer ${TOKEN}`, "Accept": "application/json" },
    });

    if (resp.status === 429) {
      const ra = parseInt(resp.headers.get("Retry-After") || "2000", 10);
      await wait(ra);
      continue;
    }
    if (!resp.ok) throw new Error(`HTTP ${resp.status}`);

    const data = await resp.json();
    all.push(...data.data);
    cursor = data.next_cursor;
    if (!cursor) break;
    await wait(200); // polite throttling
  }
  return all;
}

(async () => {
  const rows = await fetchAll();
  // Example row: { channel: "email", clicks: 120 }
  const byChannel = rows.reduce((acc, r) => {
    acc[r.channel] = (acc[r.channel] || 0) + Number(r.clicks || 0);
    return acc;
  }, {});
  const labels = Object.keys(byChannel);
  const values = labels.map(k => byChannel[k]);
  console.log({ labels, values });
})();

Example 3 — Python + SQLite: Query and aggregate for chart-ready output

Goal: Top 3 product categories by revenue this week.

import sqlite3, datetime

# Demo setup (in-memory). Replace with file or production DB connection.
conn = sqlite3.connect(":memory:")
cur = conn.cursor()
cur.execute("CREATE TABLE orders (id INTEGER, category TEXT, amount REAL, ordered_at TEXT)")
cur.executemany("INSERT INTO orders VALUES (?,?,?,?)", [
    (1, "Books", 12.5, "2024-01-02"),
    (2, "Games", 55.0, "2024-01-03"),
    (3, "Books", 8.0, "2024-01-04"),
    (4, "Toys", 7.5, "2024-01-04"),
    (5, "Games", 10.0, "2024-01-05"),
])

# Parameterized query prevents SQL injection
start = "2024-01-01"
end = "2024-01-07"
cur.execute(
    """
    SELECT category, SUM(amount) AS revenue
    FROM orders
    WHERE ordered_at BETWEEN ? AND ?
    GROUP BY category
    ORDER BY revenue DESC
    LIMIT 3
    """,
    (start, end)
)
rows = cur.fetchall()
labels = [r[0] for r in rows]
values = [r[1] for r in rows]
print({"labels": labels, "values": values})

Authentication, pagination, and reliability

Authentication: Use environment variables for tokens/keys. Send via headers; never hard-code secrets.
Pagination: Use page/limit or cursor. Loop until no next page/cursor.
Rate limiting: Respect Retry-After. Use exponential backoff (e.g., 1s, 2s, 4s...).
Timeouts and retries: Set timeouts. Retry idempotent reads (GET); avoid unsafe retries on writes.
Caching: Cache stable reference data locally to reduce calls.

Data quality checks (fast self-check)

Row count sanity: Are you missing pages? Compare expected vs fetched.
Date coverage: Min/max dates match your range?
Duplicates: Deduplicate by a stable key (e.g., id).
Nulls: Replace or drop according to chart needs.
Schema drift: Did fields rename? Add assertions.

Exercises (hands-on)

Note: Everyone can do the exercises and the quick test. Only logged-in users get saved progress.

Exercise 1 — Combine paginated API results and compute averages (Python or JS)

You receive paginated product data. Each page is like:

{
  "data": [
    {"id": 1, "category": "Books", "price": 10.0},
    {"id": 2, "category": "Games", "price": 50.0}
  ],
  "next_page": 2
}

Pages 2 and 3 continue similarly. Task:

Fetch pages 1..3 (per_page=50). Deduplicate by id.
Compute average price per category.
Output sorted by category name ascending.

Checklist:

Loop over pages until next_page is null/false.
Use a set of seen ids to drop duplicates.
Keep sum and count per category, then compute averages.

Exercise 2 — Safe SQL: Aggregate with parameters and prepare for a bar chart

Create an in-memory SQLite database with a sales table and compute revenue by region for a date range using parameterized SQL. Output { labels, values } arrays sorted by revenue desc.

Checklist:

Create table, insert sample rows.
Use WHERE sale_date BETWEEN ? AND ? (parameters, not string concat).
GROUP BY region; ORDER BY SUM(amount) DESC.
Map rows to labels and values arrays.

Common mistakes and how to self-check

Hard-coding secrets: Always use environment variables. Self-check: Search code for tokens/keys.
Ignoring pagination: Compare fetched count vs expected. Add loop until no next page/cursor.
No timeouts: Add timeouts to avoid hanging requests.
Retrying unsafe writes: Only retry idempotent GETs, or use idempotency keys for POST if supported.
Pulling too much data: Aggregate at the source; fetch only needed columns and ranges.
Not validating schema: Assert required fields exist before shaping.

Practical projects

Daily KPI fetcher: Pull daily active users from an API, aggregate, and export a chart-ready JSON file.
Sales by region dashboard feed: Query a DB view to produce { labels, values } for a bar chart, refreshed hourly.
Alert on data freshness: Script checks the latest timestamp in the dataset and logs a warning if stale.

Learning path

Start: Basic HTTP requests or DB SELECT with parameters.
Then: Pagination, rate limits, retries, and incremental loads.
Next: Schema validation, deduplication, and chart data modeling.
Finally: Scheduling, caching, and monitoring freshness.

Next steps

Complete the exercises below and compare with the provided solutions.
Take the Quick Test to confirm you grasp the fundamentals.
Apply the patterns to your current dashboard data sources.

Mini challenge

Implement an incremental loader: Fetch only yesterday’s data from an API, merge with an existing local file of prior days, deduplicate by id, and output a chart-ready time series. Include a simple freshness check that warns if yesterday has zero rows.

Menu

Data Fetching From APIs Or DB

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Core patterns you will use

Worked examples (Python and JS)

Data quality checks (fast self-check)

Exercises (hands-on)

Common mistakes and how to self-check

Practical projects

Learning path

Next steps

Mini challenge

Practice Exercises

Paginated API: Deduplicate and average by category

Instructions

Expected Output

SQLite aggregate with parameters for chart-ready output

Data Fetching From APIs Or DB — Quick Test

Have questions about Data Fetching From APIs Or DB?

AI Assistant