How to learn Data Wrangling And Transformation for Scripting Python or JS in Data Visualization Engineer for free

Why this matters

As a Data Visualization Engineer, your charts are only as good as the data behind them. Real-world data comes messy: wrong types, totals mixed with detail rows, missing values, and wide tables that don’t fit your viz. Wrangling and transformation let you clean, reshape, and compute the exact metrics your visuals rely on (e.g., stacked shares, rolling averages, YoY deltas).

Prepare “long” (tidy) data for libraries like D3, Plotly, or BI tools.
Compute chart-ready fields: percentages, bins, ranks, moving averages.
Join dimensions (e.g., product → category) to enrich context.

Who this is for

Anyone building charts or dashboards in Python or JavaScript who needs consistent, clean, and well-shaped data feeding visual components.

Prerequisites

Basic Python (lists, dicts) or JS (arrays, objects).
Comfort with reading arrays/tables and simple arithmetic.
Optional: pandas basics (Python) or array methods (JS: map/filter/reduce).

Concept explained simply

Data wrangling is turning whatever data you receive into the exact shape your visualization needs. You remove junk, convert types, reshape columns/rows, combine tables, and compute derived values.

Mental model: a tidy data pipeline

Input: raw data (wide, messy, mixed types)
Clean: remove totals, fix types, handle missing values
Reshape: wide ↔ long, pivot, unpivot
Enrich: joins (lookups), derived columns
Aggregate: group and summarize
Output: chart-ready dataset

Tip: Long vs wide

Long (tidy) format is best for most charts: each row is one observation with variables like category, date, and value. Wide tables (months as columns) are great for humans, not for code-driven visuals.

Core operations you will use

Select/rename columns to a consistent schema
Filter rows (e.g., remove subtotals, nulls, outliers)
Type conversion: strings → numbers/dates/booleans
Reshape: pivot/unpivot (wide ↔ long)
Group and aggregate: sum, mean, count, rolling windows
Joins: left, inner to add categories/metadata
Derived columns: percentages, bins, ranks, deltas
Sort and limit for top-N charts

Worked examples

Example 1: Clean and unpivot monthly sales

Goal: remove a total row, convert text numbers, and unpivot months so each row is product-month-value.

Raw (wide)
[
  {"product":"A","Jan":"10","Feb":"12","Mar":null},
  {"product":"B","Jan":"NA","Feb":"7","Mar":"9"},
  {"product":"Total","Jan":"10","Feb":"19","Mar":"9"}
]

Python (pandas)

import pandas as pd

data = [
  {"product":"A","Jan":"10","Feb":"12","Mar":None},
  {"product":"B","Jan":"NA","Feb":"7","Mar":"9"},
  {"product":"Total","Jan":"10","Feb":"19","Mar":"9"}
]

df = pd.DataFrame(data)
df = df[df["product"] != "Total"].replace("NA", pd.NA)
for m in ["Jan","Feb","Mar"]:
    df[m] = pd.to_numeric(df[m], errors="coerce")
long_df = df.melt(id_vars=["product"], var_name="month", value_name="value").dropna(subset=["value"])
print(long_df.to_dict(orient="records"))

JavaScript (arrays)

const data = [
  {product:"A",Jan:"10",Feb:"12",Mar:null},
  {product:"B",Jan:"NA",Feb:"7",Mar:"9"},
  {product:"Total",Jan:"10",Feb:"19",Mar:"9"}
];
const months = ["Jan","Feb","Mar"];
const cleaned = data.filter(r => r.product !== "Total").map(r => {
  const o = { product: r.product };
  months.forEach(m => {
    const v = r[m];
    o[m] = (v === null || v === "NA" || v === "") ? null : Number(v);
  });
  return o;
});
const long = [];
cleaned.forEach(r => months.forEach(m => {
  const v = r[m];
  if (v != null && !Number.isNaN(v)) long.push({product:r.product, month:m, value:v});
}));
console.log(long);

Example 2: Join categories and compute stacked shares

Goal: join sales to product categories, then compute per-month category share for a stacked 100% bar chart.

sales_long = [
  {"product":"A","month":"Jan","value":10},
  {"product":"A","month":"Feb","value":12},
  {"product":"B","month":"Feb","value":7},
  {"product":"B","month":"Mar","value":9},
  {"product":"C","month":"Feb","value":5}
]
products = [
  {"product":"A","category":"Gadgets"},
  {"product":"B","category":"Gadgets"},
  {"product":"C","category":"Accessories"}
]

Python (pandas)

import pandas as pd
sales = [
  {"product":"A","month":"Jan","value":10},
  {"product":"A","month":"Feb","value":12},
  {"product":"B","month":"Feb","value":7},
  {"product":"B","month":"Mar","value":9},
  {"product":"C","month":"Feb","value":5}
]
products = [
  {"product":"A","category":"Gadgets"},
  {"product":"B","category":"Gadgets"},
  {"product":"C","category":"Accessories"}
]

df = pd.DataFrame(sales).merge(pd.DataFrame(products), on="product", how="left")
cat_month = df.groupby(["month","category"], as_index=False)["value"].sum().rename(columns={"value":"category_total"})
month_total = df.groupby("month", as_index=False)["value"].sum().rename(columns={"value":"month_total"})
res = cat_month.merge(month_total, on="month")
res["share"] = (res["category_total"] / res["month_total"]).round(2)
print(res.sort_values(["month","category"]).to_dict(orient="records"))

JavaScript (arrays)

const sales = [
  {product:"A",month:"Jan",value:10},
  {product:"A",month:"Feb",value:12},
  {product:"B",month:"Feb",value:7},
  {product:"B",month:"Mar",value:9},
  {product:"C",month:"Feb",value:5}
];
const products = [
  {product:"A",category:"Gadgets"},
  {product:"B",category:"Gadgets"},
  {product:"C",category:"Accessories"}
];
const prodMap = Object.fromEntries(products.map(p => [p.product, p.category]));
const withCat = sales.map(s => ({...s, category: prodMap[s.product] || null}));
const monthTotals = withCat.reduce((acc, r) => {
  acc[r.month] = (acc[r.month] || 0) + r.value; return acc;
}, {});
const catMonthTotals = {};
withCat.forEach(r => {
  const key = r.month+"|"+r.category;
  catMonthTotals[key] = (catMonthTotals[key] || 0) + r.value;
});
const res = Object.entries(catMonthTotals).map(([k, ct]) => {
  const [month, category] = k.split("|");
  const mt = monthTotals[month];
  return {month, category, category_total: ct, month_total: mt, share: Math.round((ct/mt)*100)/100};
});
console.log(res.sort((a,b)=> a.month.localeCompare(b.month) || a.category.localeCompare(b.category)));

Example 3: Prepare rolling average for a smooth line chart

Goal: compute a 3-period rolling mean to reduce noise in a time series.

ts = [
  {"date":"2024-01-01","value":10},
  {"date":"2024-01-02","value":16},
  {"date":"2024-01-03","value":9},
  {"date":"2024-01-04","value":13},
  {"date":"2024-01-05","value":12}
]

Python (pandas)

import pandas as pd

s = pd.DataFrame(ts)
s["date"] = pd.to_datetime(s["date"])
s = s.sort_values("date")
s["roll3"] = s["value"].rolling(window=3, min_periods=1).mean().round(2)
print(s.to_dict(orient="records"))

JavaScript (arrays)

const s = ts.slice().sort((a,b)=> new Date(a.date)-new Date(b.date));
const out = s.map((row, i, arr) => {
  const start = Math.max(0, i-2);
  const win = arr.slice(start, i+1);
  const avg = Math.round((win.reduce((sum,r)=>sum+r.value,0)/win.length)*100)/100;
  return {...row, roll3: avg};
});
console.log(out);

Exercises

Try these in Python or JavaScript. Everyone can take the Quick Test; log in to save your progress.

Exercise 1 — Clean, reshape, and summarize

Input:

[
  {"product":"A","Jan":"10","Feb":"12","Mar":null},
  {"product":"B","Jan":"NA","Feb":"7","Mar":"9"},
  {"product":"Total","Jan":"10","Feb":"19","Mar":"9"}
]

Tasks:

Remove the “Total” row.
Convert month columns to numbers; treat "NA"/null as missing.
Unpivot to long (product, month, value), drop missing.
Compute total per product and sort descending.

Hints

Filter rows where product != "Total".
Pandas: melt; JS: build long array with nested loops.
Replace "NA" with null/NaN then to_numeric/Number.

Expected output:

[
  {"product":"A","total":22},
  {"product":"B","total":16}
]

Inputs:

sales_long = [
  {"product":"A","month":"Jan","value":10},
  {"product":"A","month":"Feb","value":12},
  {"product":"B","month":"Feb","value":7},
  {"product":"B","month":"Mar","value":9},
  {"product":"C","month":"Feb","value":5}
]
products = [
  {"product":"A","category":"Gadgets"},
  {"product":"B","category":"Gadgets"},
  {"product":"C","category":"Accessories"}
]

Tasks:

Left join sales to products on product.
Compute category_total per (month, category).
Compute month_total per month and category share = category_total / month_total (round to 2 decimals).

Hints

Compute month totals once; reuse for share.
Watch division by zero; guard if a month has no rows.

Expected output:

[
  {"month":"Jan","category":"Gadgets","category_total":10,"month_total":10,"share":1},
  {"month":"Feb","category":"Accessories","category_total":5,"month_total":24,"share":0.21},
  {"month":"Feb","category":"Gadgets","category_total":19,"month_total":24,"share":0.79},
  {"month":"Mar","category":"Gadgets","category_total":9,"month_total":9,"share":1}
]

Self-check checklist

No aggregate rows (like “Total”) remain in detail-level data.
Numeric columns are actual numbers, not strings.
Long format used where charts expect series/category/value rows.
Aggregations and shares recompute correctly if inputs change.

Common mistakes and how to self-check

Mixing totals with detail rows: filter them out before aggregation.
Forgetting type conversion: strings won’t sum or sort numerically.
Silent missing values: standardize "NA", "", null to actual missing and decide drop/impute.
Wrong denominator for percentages: ensure you divide by the correct group total (e.g., per month for stacked bars).
Incorrect join: use left join when you must keep all fact rows; verify key uniqueness.

Quick self-test

Pick one chart you made recently. Identify its required grain (e.g., month x category). Can you produce exactly one row per grain with the values the chart uses? If not, adjust your pipeline.

Practical projects

Sales dashboard: ingest CSVs, clean, compute stacked category shares, top-5 products, rolling trend.
Web analytics: session data → daily active users, bounce rate, 7-day rolling averages.
Survey results: unpivot multiple-choice columns, compute response distributions and NPS by segment.

Learning path

Load data safely: validate columns and types.
Clean: remove totals, standardize missing, convert types.
Reshape: wide ↔ long as charts require.
Enrich: joins for categories, geos, segments.
Aggregate: groupby and window functions for metrics.
Validate: spot-check sums, counts, and sample rows.
Export: supply consistent schema to visualization layer.

Next steps

Finish the exercises above.
Then take the Quick Test at the end of this page to check understanding.
Apply these steps to your next visualization—save a reusable transformation script.

Mini challenge

Given daily signups by source in wide format (columns Mon–Sun), remove totals, unpivot to long, join a lookup table mapping sources to channels, and compute channel share per day for a stacked percent bar. Bonus: add a 3-day rolling mean.

Starter data

signups_wide = [
  {"source":"Email","Mon":"12","Tue":"10","Wed":"8"},
  {"source":"Ads","Mon":"20","Tue":"18","Wed":"22"},
  {"source":"Total","Mon":"32","Tue":"28","Wed":"30"}
]
lookup = [
  {"source":"Email","channel":"Owned"},
  {"source":"Ads","channel":"Paid"}
]

Menu

Data Wrangling And Transformation

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Mental model: a tidy data pipeline

Core operations you will use

Worked examples

Example 1: Clean and unpivot monthly sales

Example 2: Join categories and compute stacked shares

Example 3: Prepare rolling average for a smooth line chart

Exercises

Exercise 1 — Clean, reshape, and summarize

Self-check checklist

Common mistakes and how to self-check

Practical projects

Learning path

Next steps

Mini challenge

Practice Exercises

Clean, reshape, and summarize

Instructions

Expected Output

Join and compute category share

Data Wrangling And Transformation — Quick Test

Have questions about Data Wrangling And Transformation?

AI Assistant

Menu

Data Wrangling And Transformation

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Mental model: a tidy data pipeline

Core operations you will use

Worked examples

Example 1: Clean and unpivot monthly sales

Example 2: Join categories and compute stacked shares

Example 3: Prepare rolling average for a smooth line chart

Exercises

Exercise 1 — Clean, reshape, and summarize

Exercise 2 — Join and compute category share

Self-check checklist

Common mistakes and how to self-check

Practical projects

Learning path

Next steps

Mini challenge

Practice Exercises

Clean, reshape, and summarize

Instructions

Expected Output

Join and compute category share

Data Wrangling And Transformation — Quick Test

Have questions about Data Wrangling And Transformation?

AI Assistant