Why this matters
Scales and axes turn raw numbers into readable visuals. As a Data Visualization Engineer, you will:
- Map data ranges to pixel space so bars, lines, and marks render correctly across devices.
- Choose scale types (linear, log, time, categorical) that reflect how audiences should compare values.
- Format axes (ticks, labels, units) to reduce cognitive load and avoid misleading readers.
- Handle edge cases (outliers, negative values, sparse time data) without distorting the story.
Real tasks you will face:
- Design a revenue dashboard where small monthly changes are visible but bars remain comparable.
- Plot metrics spanning several orders of magnitude without compressing smaller values.
- Build a categorical bar chart with dozens of labels that must remain legible.
Concept explained simply
A scale is a function that maps a data value (domain) to a visual value (range). Axes are visual guides (lines, ticks, labels) that show how to read that mapping.
Mental model
Think of a scale as a translator: it takes a number, date, or category and returns a position, size, or color. The axis is the subtitle that explains the translation to your audience.
Common scale types at a glance
- Linear: equal steps in data become equal steps on screen (counts, dollars).
- Log: equal ratios become equal steps (10, 100, 1000). Great for orders of magnitude.
- Power/Sqrt: nonlinear emphasis while keeping zero meaningful.
- Time: dates map to time; supports irregular intervals.
- Band/Point (categorical): allocates equal space per category; bands support padding for bars.
- Quantize/Quantile (for color): convert continuous data to buckets.
How to choose a scale
- Identify data type: categorical, numeric, or time.
- Identify comparison type: additive differences (linear), multiplicative ratios (log), rankings (band/point).
- Check constraints: need zero baseline? (bar charts), irregular time intervals? (time), outliers?
- Decide domain strategy: fixed (consistent dashboards) or dynamic (auto-fit a single view).
- Refine aesthetics: nice rounding, tick count, label format, padding, gridlines.
Zero baseline rule of thumb
- Bar/area charts encode magnitude by area/length: start at zero to avoid exaggeration.
- Line/point charts show change/shape: zero is optional; prioritize variability.
Worked examples
1) Monthly revenue bar chart (USD)
- Data: 18k, 22k, 21k, 24k, 20k.
- Scale choice: x = band (months), y = linear.
- Domain: y from 0 to slightly above max (e.g., 0–25k) to keep bars comparable.
- Ticks: every 5k; format "$25k".
- Gridlines: horizontal on y to aid reading.
- Why: bars require zero baseline; nice ticks reduce mental math.
2) Log scale for file sizes
- Data: 5 KB, 120 KB, 3 MB, 150 MB, 2 GB.
- Scale choice: y = log10; x = categorical (file names).
- Domain: from 1 KB to 2 GB; ticks at powers of 10 (1KB, 10KB, 100KB, 1MB, 10MB, 100MB, 1GB).
- Formatting: show units; clarify in axis label: "Size (log scale)".
- Why: values span orders of magnitude; log preserves multiplicative comparisons.
3) Time series of temperature (°C)
- Data: daily temperatures over a month.
- Scale choice: x = time; y = linear (no need to start at 0).
- Domain: y from min-2 to max+2 for breathing room.
- Ticks: x monthly or weekly ticks; y every 2–5 °C.
- Why: line chart emphasizes shape and change; zero baseline can hide meaningful variation.
Useful scale and axis controls
- nice: rounds domain to clean numbers (e.g., 23 to 25).
- clamp: pins out-of-range values to the range ends to avoid overflow.
- padding (band): space between bars for readability.
- tick count or interval: controls density; prefer 4–8 visible ticks for clarity.
- tick format: add units, thousands separators, percent signs.
- gridlines: use lightly for alignment; avoid heavy clutter.
- reverse: useful for rankings where 1 is at the top.
Exercises
Mirror exercises are also listed below as a separate section with solutions.
Exercise 1: Choose scales and ticks for a bar chart
Dataset: Monthly conversions = [0, 4, 12, 8, 15, 22, 18]. Build a bar chart showing months on the x-axis and conversions on the y-axis.
- Pick x and y scale types.
- State the y domain (include whether zero is included) and a sensible nice domain.
- Propose tick values and label format.
Show solution
x scale: band (categorical months). y scale: linear.
y domain: start at 0 (bars) and extend slightly above the max. Raw domain [0, 22]; nice domain [0, 25].
Ticks: 0, 5, 10, 15, 20, 25. Labels: plain integers ("0", "5", "10"...), or "25" with no unit if conversions are counts.
Reasoning: Bars need a zero baseline; 25 provides headroom so top bars are not touching the axis.
Self-check checklist
- x uses band for discrete months, not linear.
- y starts at 0 because it's a bar chart.
- Ticks are readable (about 4–8) and evenly spaced.
- Labels use a consistent numeric format.
Common mistakes and how to self-check
- Using linear instead of band for categories. Self-check: Are categories equally spaced? If yes, use band/point.
- Omitting zero baseline on bars. Self-check: Does area/length encode value? If yes, include zero.
- Using log scales without labeling. Self-check: Axis label reads "(log scale)" and ticks show powers/ratios.
- Overcrowded ticks. Self-check: Count visible tick labels; aim for 4–8.
- Inconsistent units. Self-check: Axis label and tick format include units where relevant.
- Dual y-axes causing confusion. Self-check: If two scales are needed, prefer small multiples or normalized indices unless the audience is expert.
Practical projects
- Sales dashboard: Monthly sales (bar), cumulative sales (line), and average order value (line). Configure separate scales appropriately without dual y-axes; consider small multiples.
- Log-scale exploration: Visualize API response times (ms to minutes). Add clear log labeling and power-of-10 ticks.
- Categorical ranking: Top 30 products by margin using a horizontal bar chart. Optimize band padding, label truncation, and tick formatting for currency and percent.
Who this is for
- Aspiring Data Visualization Engineers building clear, accurate charts.
- BI developers adding custom visuals to dashboards.
- Analysts who want reliable, readable plots for stakeholders.
Prerequisites
- Basic chart literacy (bar, line, scatter).
- Comfort with numeric types and dates.
- Basic understanding of encoding channels (position, length, color).
Learning path
- Identify data type and comparison goal (difference vs ratio).
- Pick scale type (linear, log, time, band/point).
- Set domain (fixed vs auto), apply nice and clamp as needed.
- Configure ticks, gridlines, and label formats.
- Validate with sample data and edge cases (zeros, negatives, outliers).
- Peer review: check for misleading baselines or crowded labels.
Next steps
- Apply the checklist on one of your existing charts and improve its scales.
- Take the quick test to confirm understanding (free; log in to save your progress).
- Start a practical project and iterate with stakeholder feedback.
Mini challenge
Scenario: You must plot daily active users (DAU) for the last 90 days and a distribution of session durations (seconds) that ranges from 2s to 3600s.
- Which scales do you choose for DAU over time and for the session duration distribution?
- Where do you include zero and why?
- Which tick strategy keeps labels readable?
See a sample approach
- DAU: x = time, y = linear; zero not required (line chart). Use weekly ticks on x; y ticks at nice rounded values.
- Session durations: x = log for histogram bins or x = linear with sqrt y-scale if long tail; label x ticks at 1s, 10s, 1m, 10m, 1h.
- Include units in axis labels and keep 5–7 ticks visible.