Why this matters
As a Data Visualization Engineer, most bugs come from small changes: a chart option that breaks a build, a SQL tweak that returns empty rows, or a dependency update that slows dashboards. Continuous Integration (CI) runs builds and checks automatically on every change so you catch issues early, keep dashboards reliable, and merge with confidence.
- Prevent broken dashboards by failing builds when unit tests or linting fail.
- Catch data issues with quick SQL checks before deploying.
- Share preview artifacts (static builds or screenshots) so reviewers see changes without running your project locally.
Concept explained simply
CI is a robot that tests every change you push. It runs a script (pipeline) with steps like install, lint, test, build, and publish artifacts. If any step returns a non-zero exit code, the pipeline fails and blocks the merge.
Mental model
Picture a guarded door. Each guard checks something: style, tests, data quality, and build. Only when all guards nod can your code pass through to the main branch.
Mini glossary
- Pipeline: The full set of automated steps.
- Job/Step: A single task (e.g., run tests).
- Trigger: What starts the pipeline (push, pull request, schedule).
- Artifact: Files produced by the build (e.g., static site zip, screenshots) for reviewers.
- Cache: Reused dependencies to speed up runs.
- Matrix: Run the same job across versions or environments.
Worked examples
Example 1 — Lint and test a mixed Python + JS viz repo
Goal: Fail fast on style and tests. The YAML below runs Python linting and tests, then Node build checks.
# .ci/ci.yml (example syntax similar to many CI providers)
name: ci
on:
- push
- pull_request
jobs:
lint_and_test:
runs-on: ubuntu-latest
steps:
- name: Checkout
run: git clone . repo && cd repo || true
- name: Setup Python
run: |
python -V || true
python -m pip install --upgrade pip
pip install ruff pytest
- name: Python lint
run: ruff .
- name: Python tests
run: pytest -q
- name: Setup Node
run: |
node -v || true
npm ci
- name: JS lint
run: npx eslint .
- name: Build (dry check)
run: npm run build --if-present
Result: If ruff, pytest, or eslint fails, the pipeline fails and blocks merge.
Example 2 — Add a fast SQL data quality check
Goal: Ensure the dataset feeding a chart is non-empty and has valid dates before building.
# Add to your CI steps
- name: Data checks
run: |
python - <<'PY'
import os, sys, sqlite3
# Example uses SQLite for demonstration; adapt to your warehouse client.
db = sqlite3.connect('data.db')
cur = db.cursor()
cur.execute("""
SELECT COUNT(*) FROM sales_facts;
""")
n = cur.fetchone()[0]
if n <= 0:
print("FAIL: sales_facts is empty")
sys.exit(1)
cur.execute("""
SELECT COUNT(*) FROM sales_facts WHERE order_date IS NULL;
""")
nulls = cur.fetchone()[0]
if nulls > 0:
print("FAIL: order_date has NULLs")
sys.exit(1)
print("Data checks passed")
PY
Result: CI fails if the table is empty or dates are null, stopping broken visualizations early.
Example 3 — Build a static preview and keep it as an artifact
Goal: Reviewers can download a zipped preview without running anything locally.
# Add to your CI steps after a successful build
- name: Package preview
run: |
mkdir -p preview
cp -r dist/* preview/ || true
tar -czf preview.tgz -C preview .
# Upload artifacts (provider-specific). Example name only.
- name: Upload preview artifact
run: echo "Upload preview.tgz as build artifact via your CI provider step"
Result: Reviewers download preview.tgz from the CI run to inspect changes.
Step-by-step: set up a minimal CI
- Define triggers: Run on pull requests and on pushes to your feature branches.
- Add fast checks first: Linting and unit tests. Keep them under 2 minutes.
- Add data checks: Simple SQL assertions (non-empty, no nulls in key columns, reasonable ranges).
- Build preview: Produce a static build or artifact for reviewers.
- Make failures loud: Ensure each check exits non-zero on failure.
- Speed it up: Cache dependencies, skip unchanged paths, and parallelize jobs when possible.
Path filters idea
Run JS linting only when files in src/ change; run SQL checks only when sql/ or models/ change. This keeps CI fast.
Exercises you can do now
These mirror the tasks in the Exercises section below. Do them in a fresh branch so you can open a pull request and watch CI run.
- Exercise 1: Create a pipeline that lints Python and JS, runs tests, and fails on errors.
- Exercise 2: Add a SQL data check step that fails when a critical table is empty or a key column is null.
Exercise checklist
- CI triggers on pull requests and pushes.
- Lint, test, data checks, and build steps are present.
- Failures block merge (non-zero exit code).
- Preview artifact is saved after a successful build.
- At least one optimization (cache or path filters) is applied.
Common mistakes and how to self-check
- Letting warnings pass: Configure linters/tests to fail on critical warnings. Self-check: Intentionally introduce a linter error; does CI fail?
- Skipping data checks: Visuals can look fine with empty datasets. Self-check: Temporarily point a chart to an empty table; does CI catch it?
- Unpinned dependencies: Unexpected breaks after updates. Self-check: Pin versions in requirements and lockfiles; re-run CI.
- Slow pipelines: Everything runs every time. Self-check: Add path filters and cache; compare run times.
- Secrets in logs: Printing keys in build output is risky. Self-check: Search logs for secrets; mask or avoid echoing sensitive values.
- No artifacts: Reviewers cannot see changes. Self-check: Confirm a preview artifact exists for each successful run.
Practical projects
- Project A: Convert a manual visualization repo into CI-driven: add linting, tests, SQL checks, and preview artifacts.
- Project B: Add a matrix job to test your visualization component across two Node versions and two screen widths (for responsive checks using headless screenshots).
- Project C: Introduce a nightly scheduled run that rebuilds charts and runs expanded data checks (longer tests) while keeping PR pipelines fast.
Mini challenge
You changed a chart component and the SQL powering it. Design a pipeline that:
- Runs JS lint and unit tests only when
src/changes. - Runs data checks only when
sql/ormodels/change. - Builds a preview artifact if both sets of checks pass.
- Blocks merge if any check fails.
Hint
Use path filters for conditional jobs and make build depend on successful completion of checks.
Who this is for
- Data Visualization Engineers who commit code to shared repos.
- Analytics Engineers implementing tests for dashboards or semantic layers.
- Anyone who wants automated quality gates on data-powered visuals.
Prerequisites
- Basic Git workflow (branch, commit, pull request).
- Ability to run your project locally (install, test, build).
- Familiarity with a linter and test runner in your stack.
Learning path
- Before: Git basics, branching strategies, local linting/testing.
- Now: CI basics for builds and checks (this subskill).
- Next: CI optimization (caching, path filters, parallel jobs) and CD previews or staging deploys.
Next steps
- Harden your checks: add thresholds (e.g., minimum row counts, freshness windows).
- Capture visual regressions with screenshot comparisons on critical charts.
- Introduce a deployment preview to a non-production environment after passing checks.
Take the Quick Test
The quick test below is available to everyone. Only logged-in users have their progress saved.