Why this matters
As a Data Visualization Engineer, you turn raw data into understandable visuals. To do that, you must reliably fetch data from APIs and databases, shape it into chart-friendly structures, and handle realities like authentication, pagination, rate limits, and data quality. Typical tasks include:
- Pulling metrics from REST APIs (e.g., product, marketing, or telemetry data) for dashboards.
- Querying databases to produce aggregated, chart-ready datasets.
- Scheduling incremental fetches (e.g., yesterday only) to keep visuals fresh.
- Handling errors and retries so charts don’t break when APIs hiccup.
Who this is for
- Data Visualization Engineers and BI developers who prepare data for dashboards.
- Analysts who want to automate chart data pipelines using Python or JavaScript.
- Engineers moving from manual CSV workflows to API/DB-driven visuals.
Prerequisites
- Basic Python or JavaScript syntax (variables, loops, functions).
- Familiarity with JSON and simple SQL SELECT queries.
- Comfort using environment variables for secrets (API keys, DB credentials).
Concept explained simply
Fetching data is like filling a clean water bottle:
- Source = API or Database (the faucet).
- Transport = HTTP request or DB connection (the pipe).
- Filter/Shape = aggregation and selection (the filter).
- Container = your chart dataset (the bottle).
Mental model: Source → Access → Select → Shape → Validate → Deliver.
Core patterns you will use
Worked examples (Python and JS)
Example 1 — Python: Fetch a REST API with pagination, safe retries, and chart shaping
Goal: Sum order amounts per day for January, ready for a line chart.
import os, time, requests
from datetime import date
API_URL = "https://api.example.com/orders" # placeholder
TOKEN = os.getenv("API_TOKEN")
headers = {"Authorization": f"Bearer {TOKEN}", "Accept": "application/json"}
params = {"start_date": "2024-01-01", "end_date": "2024-01-31", "per_page": 200}
all_items = []
page = 1
while True:
try:
resp = requests.get(API_URL, headers=headers, params={**params, "page": page}, timeout=15)
if resp.status_code == 429:
# Rate-limited: exponential backoff respecting Retry-After when present
retry_after = int(resp.headers.get("Retry-After", 2))
time.sleep(retry_after)
continue
resp.raise_for_status()
payload = resp.json()
items = payload.get("data", [])
all_items.extend(items)
if not payload.get("next_page"): # assume server returns null/false at end
break
page += 1
except requests.RequestException as e:
# Simple retry then give up
time.sleep(2)
resp = requests.get(API_URL, headers=headers, params={**params, "page": page}, timeout=15)
if resp.ok:
continue
else:
raise e
# Shape for chart: daily totals
from collections import defaultdict
by_day = defaultdict(float)
for row in all_items:
# Example row: {"order_id": 1, "ordered_at": "2024-01-03", "amount": 39.5}
by_day[row["ordered_at"]] += float(row["amount"])
series = sorted((d, by_day[d]) for d in by_day) # [("2024-01-01", 123.4), ...]
print({"labels": [d for d, _ in series], "values": [v for _, v in series]})
Output fits many chart libs: labels (dates) and values (totals).
Example 2 — JavaScript: Fetch with cursor-based pagination and backoff
Goal: Get campaign clicks grouped by channel, ready for a bar chart.
// Node.js 18+ (fetch available). Browser usage similar (watch CORS).
const API_URL = "https://api.example.com/metrics"; // placeholder
const TOKEN = process.env.API_TOKEN;
async function wait(ms) { return new Promise(r => setTimeout(r, ms)); }
async function fetchAll() {
let cursor = null;
const all = [];
while (true) {
const url = new URL(API_URL);
url.searchParams.set("metric", "clicks");
if (cursor) url.searchParams.set("cursor", cursor);
const resp = await fetch(url, {
headers: { "Authorization": `Bearer ${TOKEN}`, "Accept": "application/json" },
});
if (resp.status === 429) {
const ra = parseInt(resp.headers.get("Retry-After") || "2000", 10);
await wait(ra);
continue;
}
if (!resp.ok) throw new Error(`HTTP ${resp.status}`);
const data = await resp.json();
all.push(...data.data);
cursor = data.next_cursor;
if (!cursor) break;
await wait(200); // polite throttling
}
return all;
}
(async () => {
const rows = await fetchAll();
// Example row: { channel: "email", clicks: 120 }
const byChannel = rows.reduce((acc, r) => {
acc[r.channel] = (acc[r.channel] || 0) + Number(r.clicks || 0);
return acc;
}, {});
const labels = Object.keys(byChannel);
const values = labels.map(k => byChannel[k]);
console.log({ labels, values });
})();
Example 3 — Python + SQLite: Query and aggregate for chart-ready output
Goal: Top 3 product categories by revenue this week.
import sqlite3, datetime
# Demo setup (in-memory). Replace with file or production DB connection.
conn = sqlite3.connect(":memory:")
cur = conn.cursor()
cur.execute("CREATE TABLE orders (id INTEGER, category TEXT, amount REAL, ordered_at TEXT)")
cur.executemany("INSERT INTO orders VALUES (?,?,?,?)", [
(1, "Books", 12.5, "2024-01-02"),
(2, "Games", 55.0, "2024-01-03"),
(3, "Books", 8.0, "2024-01-04"),
(4, "Toys", 7.5, "2024-01-04"),
(5, "Games", 10.0, "2024-01-05"),
])
# Parameterized query prevents SQL injection
start = "2024-01-01"
end = "2024-01-07"
cur.execute(
"""
SELECT category, SUM(amount) AS revenue
FROM orders
WHERE ordered_at BETWEEN ? AND ?
GROUP BY category
ORDER BY revenue DESC
LIMIT 3
""",
(start, end)
)
rows = cur.fetchall()
labels = [r[0] for r in rows]
values = [r[1] for r in rows]
print({"labels": labels, "values": values})
Authentication, pagination, and reliability
- Authentication: Use environment variables for tokens/keys. Send via headers; never hard-code secrets.
- Pagination: Use page/limit or cursor. Loop until no next page/cursor.
- Rate limiting: Respect Retry-After. Use exponential backoff (e.g., 1s, 2s, 4s...).
- Timeouts and retries: Set timeouts. Retry idempotent reads (GET); avoid unsafe retries on writes.
- Caching: Cache stable reference data locally to reduce calls.
Data quality checks (fast self-check)
- Row count sanity: Are you missing pages? Compare expected vs fetched.
- Date coverage: Min/max dates match your range?
- Duplicates: Deduplicate by a stable key (e.g., id).
- Nulls: Replace or drop according to chart needs.
- Schema drift: Did fields rename? Add assertions.
Exercises (hands-on)
Note: Everyone can do the exercises and the quick test. Only logged-in users get saved progress.
Exercise 1 — Combine paginated API results and compute averages (Python or JS)
You receive paginated product data. Each page is like:
{
"data": [
{"id": 1, "category": "Books", "price": 10.0},
{"id": 2, "category": "Games", "price": 50.0}
],
"next_page": 2
}
Pages 2 and 3 continue similarly. Task:
- Fetch pages 1..3 (per_page=50). Deduplicate by id.
- Compute average price per category.
- Output sorted by category name ascending.
Checklist:
- Loop over pages until next_page is null/false.
- Use a set of seen ids to drop duplicates.
- Keep sum and count per category, then compute averages.
Exercise 2 — Safe SQL: Aggregate with parameters and prepare for a bar chart
Create an in-memory SQLite database with a sales table and compute revenue by region for a date range using parameterized SQL. Output { labels, values } arrays sorted by revenue desc.
Checklist:
- Create table, insert sample rows.
- Use WHERE sale_date BETWEEN ? AND ? (parameters, not string concat).
- GROUP BY region; ORDER BY SUM(amount) DESC.
- Map rows to labels and values arrays.
Common mistakes and how to self-check
- Hard-coding secrets: Always use environment variables. Self-check: Search code for tokens/keys.
- Ignoring pagination: Compare fetched count vs expected. Add loop until no next page/cursor.
- No timeouts: Add timeouts to avoid hanging requests.
- Retrying unsafe writes: Only retry idempotent GETs, or use idempotency keys for POST if supported.
- Pulling too much data: Aggregate at the source; fetch only needed columns and ranges.
- Not validating schema: Assert required fields exist before shaping.
Practical projects
- Daily KPI fetcher: Pull daily active users from an API, aggregate, and export a chart-ready JSON file.
- Sales by region dashboard feed: Query a DB view to produce { labels, values } for a bar chart, refreshed hourly.
- Alert on data freshness: Script checks the latest timestamp in the dataset and logs a warning if stale.
Learning path
- Start: Basic HTTP requests or DB SELECT with parameters.
- Then: Pagination, rate limits, retries, and incremental loads.
- Next: Schema validation, deduplication, and chart data modeling.
- Finally: Scheduling, caching, and monitoring freshness.
Next steps
- Complete the exercises below and compare with the provided solutions.
- Take the Quick Test to confirm you grasp the fundamentals.
- Apply the patterns to your current dashboard data sources.
Mini challenge
Implement an incremental loader: Fetch only yesterday’s data from an API, merge with an existing local file of prior days, deduplicate by id, and output a chart-ready time series. Include a simple freshness check that warns if yesterday has zero rows.