Who this is for
Business Analysts who already write basic SQL and need to deliver clean CSV/Excel/JSON files to stakeholders, dashboards, or downstream tools.
Prerequisites
- Basic SELECT, WHERE, GROUP BY, ORDER BY
- Comfort with your SQL client or command-line tool
- Access to a sandbox or read-only database
Why this matters
Executives, finance, and marketing teams often need a file they can open, filter, and pivot. Clean exports prevent rework and speed decisions. Typical tasks:
- Send a monthly KPI CSV with consistent columns and date formats
- Provide a pivot-friendly file for Excel
- Export a sample dataset for data science or QA
- Share a JSON extract for a small integration
Concept explained simply
Exporting is turning your query into a file others can use. You choose a format (CSV, TSV, Excel, JSON), set options (delimiter, header, encoding), clean columns (names, types, dates), and write the file.
Mental model
- Pipe: Query -> Format -> Clean -> Export -> Deliver
- Think “contract”: the file is a contract of columns, types, and meanings. Keep it stable.
Quick checklist before exporting
- Headers present and human-readable
- Dates/times in correct timezone and format
- Numbers rounded/typed correctly (no thousand separators)
- Delimiter consistent (CSV or TSV) and quotes for text
- NULLs represented as empty cells (unless otherwise agreed)
- UTF-8 encoding to avoid garbled characters
- No sensitive data unless authorized
Common formats and key settings
- CSV: Most universal for Excel/BI. Use comma delimiter, double quotes for text, header row, UTF-8.
- TSV: Uses tab delimiter; safer for text containing commas.
- Excel (XLSX): Nice for styling/pivots; some tools save as XLSX, but CSV is often simpler and lighter.
- JSON: Good for integrations or nested data.
Important settings:
- Delimiter: comma for CSV, tab for TSV
- Header row: include column names
- Quoting: quote text fields to protect delimiters
- NULLs: usually empty cells
- Dates/times: specify timezone; use ISO-like formats (YYYY-MM-DD, YYYY-MM-DD HH:MM:SS)
- Encoding: UTF-8
- Rows: be mindful of Excel row limits; split files if needed
Worked examples
Example 1: PostgreSQL — CSV with header
Goal: Export monthly revenue by region as CSV with a header.
-- Query (PostgreSQL)
\copy (
SELECT date_trunc('month', o.order_date)::date AS month,
c.region,
ROUND(SUM(o.total_usd)::numeric, 2) AS revenue_usd
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'completed'
GROUP BY 1, 2
ORDER BY 1, 2
) TO 'monthly_revenue_by_region.csv' CSV HEADERNotes: \copy runs from your client and writes to your local file. Ensure you have permission to write to the target folder.
Example 2: MySQL/MariaDB — TSV to avoid comma issues
Goal: Export order details where product names may contain commas. Use TSV.
SELECT DATE_FORMAT(order_date, '%Y-%m-01') AS month,
region,
ROUND(SUM(total_usd), 2) AS revenue_usd
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE status = 'completed'
GROUP BY 1,2
ORDER BY 1,2
INTO OUTFILE '/tmp/monthly_revenue_by_region.tsv'
FIELDS TERMINATED BY '\t' ENCLOSED BY '"'
LINES TERMINATED BY '\n';Notes: The OUTFILE path must be writable by the server and allowed by configuration. Adjust path as needed.
Example 3: SQL Server — Export via bcp (CSV)
Goal: Export top customers to CSV.
-- In T-SQL
WITH revenue AS (
SELECT c.CustomerID,
c.Region,
SUM(o.TotalUSD) AS RevenueUSD
FROM dbo.Orders o
JOIN dbo.Customers c ON c.CustomerID = o.CustomerID
WHERE o.Status = 'completed'
GROUP BY c.CustomerID, c.Region
)
SELECT TOP 100 WITH TIES CustomerID, Region, RevenueUSD
FROM revenue
ORDER BY RevenueUSD DESC;Then use command-line bcp (run from your machine):
bcp "SELECT TOP 100 WITH TIES CustomerID, Region, RevenueUSD FROM MyDB.dbo.Revenue ORDER BY RevenueUSD DESC" queryout top_customers.csv -c -t, -S SERVERNAME -d MyDB -TOptions: -c (char), -t, sets comma delimiter, -T uses trusted connection. Adjust credentials as needed.
Example 4: SQLite — CSV via shell
.headers on
.mode csv
.output orders_sample.csv
SELECT order_id, customer_id, date(order_date) AS order_date, total_usd
FROM orders
WHERE order_date >= '2024-01-01'
ORDER BY order_date
LIMIT 1000;
.output stdoutExample 5: JSON export (PostgreSQL)
\copy (
SELECT COALESCE(json_agg(t), '[]')
FROM (
SELECT o.id AS order_id,
to_char(o.order_date, 'YYYY-MM-DD') AS order_date,
c.region,
ROUND(o.total_usd::numeric, 2) AS total_usd
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'completed'
ORDER BY o.order_date
) t
) TO 'orders.json'Data cleaning before export
- Rename columns with clear aliases: SELECT total_usd AS revenue_usd
- Format dates explicitly: to_char/order by date functions
- Handle NULLs: COALESCE(value, 0) or COALESCE(text, '') as agreed
- Trim text: TRIM() to remove accidental spaces
- Round numbers: ROUND(value, 2)
- Order rows predictably: ORDER BY columns that matter
Step-by-step cleaning recipe
- Select only needed columns
- Alias columns to stakeholder-friendly names
- COALESCE and ROUND numerics
- Normalize dates/times and timezone
- Sort rows and limit if sampling
- Export with headers, UTF-8
Privacy and governance
- Export the minimum needed columns
- Exclude PII unless explicitly approved
- Use aggregated data for broad sharing
- Add a brief note with date range, timezone, and caveats
- Store files in approved locations only
Exercises
Do these in a safe environment. Solutions are provided for major databases.
Exercise 1 — Monthly revenue by region (CSV)
Tables: orders(id, customer_id, order_date, status, total_usd), customers(id, region)
- Task: Export a CSV with columns: month (YYYY-MM-01), region, revenue_usd.
- Rules: Only completed orders; include header; UTF-8; two decimal places; sorted by month, region.
Expected output (first lines):
month,region,revenue_usd
2024-01-01,EMEA,102345.27
2024-01-01,NAMER,99321.40
...Hints
- Use date truncation per DB (DATE_TRUNC, DATE_FORMAT)
- ROUND/CAST to 2 decimals
- PostgreSQL: \copy ... CSV HEADER
Show solution
-- PostgreSQL
\copy (
SELECT date_trunc('month', o.order_date)::date AS month,
c.region,
ROUND(SUM(o.total_usd)::numeric, 2) AS revenue_usd
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'completed'
GROUP BY 1, 2
ORDER BY 1, 2
) TO 'monthly_revenue_by_region.csv' CSV HEADER;
-- MySQL/MariaDB (CSV)
SELECT DATE_FORMAT(order_date, '%Y-%m-01') AS month,
c.region,
ROUND(SUM(total_usd), 2) AS revenue_usd
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE status = 'completed'
GROUP BY 1,2
ORDER BY 1,2
INTO OUTFILE '/tmp/monthly_revenue_by_region.csv'
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n';
-- SQL Server (bcp with queryout)
WITH revenue AS (
SELECT CONVERT(date, DATEFROMPARTS(YEAR(order_date), MONTH(order_date), 1)) AS month,
c.Region,
ROUND(SUM(o.TotalUSD), 2) AS revenue_usd
FROM dbo.Orders o
JOIN dbo.Customers c ON c.Id = o.CustomerId
WHERE o.Status = 'completed'
GROUP BY CONVERT(date, DATEFROMPARTS(YEAR(order_date), MONTH(order_date), 1)), c.Region
)
SELECT month, Region, revenue_usd FROM revenue ORDER BY month, Region;
-- Then export with bcp:
-- bcp "SELECT month, Region, revenue_usd FROM MyDB.dbo.revenue ORDER BY month, Region" queryout monthly_revenue_by_region.csv -c -t, -S SERVER -d MyDB -TExercise 2 — Top 100 customers as JSON
Tables: customers(id, company_name, region), orders(id, customer_id, status, total_usd)
- Task: Export a JSON array of top 100 customers by total_usd (completed only). Keys: customer_id, company_name, region, revenue_usd.
- Rules: revenue as number (2 decimals), sorted descending.
Expected output (snippet):
[
{"customer_id": 42, "company_name": "Acme Corp", "region": "EMEA", "revenue_usd": 120345.25},
{"customer_id": 7, "company_name": "BlueSky", "region": "APAC", "revenue_usd": 118930.10}
]Hints
- Aggregate first, then limit 100
- Use JSON functions or aggregate to JSON array
Show solution
-- PostgreSQL
\copy (
WITH revenue AS (
SELECT c.id AS customer_id,
c.company_name,
c.region,
ROUND(SUM(o.total_usd)::numeric, 2) AS revenue_usd
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'completed'
GROUP BY c.id, c.company_name, c.region
)
SELECT COALESCE(json_agg(t), '[]')
FROM (
SELECT * FROM revenue ORDER BY revenue_usd DESC, customer_id LIMIT 100
) t
) TO 'top_customers.json';
-- MySQL/MariaDB
SELECT JSON_ARRAYAGG(JSON_OBJECT(
'customer_id', customer_id,
'company_name', company_name,
'region', region,
'revenue_usd', revenue_usd
)) AS json_data
FROM (
SELECT c.id AS customer_id, c.company_name, c.region,
ROUND(SUM(o.total_usd), 2) AS revenue_usd
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'completed'
GROUP BY c.id, c.company_name, c.region
ORDER BY revenue_usd DESC, customer_id
LIMIT 100
) t
INTO OUTFILE '/tmp/top_customers.json';Exercise handoff checklist
- Filename, date, and version are clear
- Header row present and accurate
- Delimiter and quotes verified
- Timezone documented in a note or filename if applicable
- Spot-checked 5–10 random rows
Common mistakes and self-check
- Missing headers: Stakeholders cannot map columns. Fix: add HEADER/headers on.
- Wrong delimiter: Commas inside text break columns. Fix: quote text or use TSV.
- Timezone mismatch: Dates don’t align with dashboards. Fix: convert and document.
- Excel auto-format: IDs become scientific notation. Fix: keep as text, avoid thousand separators, warn recipients.
- Unrounded metrics: Tiny differences across reports. Fix: ROUND consistently.
- PII leakage: Extra columns slipped in. Fix: select only required columns; review before export.
- Encoding issues: Accents look garbled. Fix: export UTF-8.
Self-check routine (2 minutes)
- Open in a plain-text editor: inspect header, delimiter, and a few rows
- Open in Excel/Sheets: verify columns align; sort and filter
- Re-run query count vs. rows exported (if applicable)
- Confirm date range and timezone match request
Mini challenge
Create a TSV file for Marketing with columns: campaign_id, campaign_name, start_date (YYYY-MM-DD), end_date (YYYY-MM-DD), leads, cost_usd, cpl_usd (cost per lead, 2 decimals). Requirements: header, tab-delimited, UTF-8, sorted by start_date, and no campaigns with zero leads.
Hints
- Compute cpl_usd as cost_usd / NULLIF(leads, 0)
- Filter WHERE leads > 0
- Use a date format function if needed
Suggested approach
-- Example (PostgreSQL idea)
\copy (
SELECT campaign_id,
campaign_name,
to_char(start_date, 'YYYY-MM-DD') AS start_date,
to_char(end_date, 'YYYY-MM-DD') AS end_date,
leads,
ROUND(cost_usd::numeric, 2) AS cost_usd,
ROUND((cost_usd / NULLIF(leads,0))::numeric, 2) AS cpl_usd
FROM campaigns
WHERE leads > 0
ORDER BY start_date
) TO 'marketing_campaigns.tsv' WITH (FORMAT csv, DELIMITER E'\t', HEADER true)Practical projects
- Monthly KPI Pack: Automated CSVs for revenue, new users, churn; consistent headers and formats
- Data Sharing Bundle: A sanitized dataset (CSV) plus a JSON dictionary describing each column
- Pivot-Ready Sales File: One row per day per region per product, tested to open cleanly in Excel
Learning path
- Data validation queries: reconcile exports with source counts
- Window functions for cumulative metrics before export
- Scheduling exports with scripts or scheduled jobs
- Intro to BI tools for visualization after export
Next steps
- Take the quick test below to confirm understanding
- Practice on real queries you run weekly—turn them into clean, repeatable exports
Note: The quick test is available to everyone; only logged-in users have their progress saved.