Why SQL matters for Marketing Analysts
SQL lets you pull, shape, and validate marketing data without waiting on others. You can answer questions like: Which channel drives the highest LTV? Are last week’s campaigns profitable after CAC? Where are tracking gaps? With SQL, you go directly to the source tables, join ad platforms with CRM, and prepare attribution-ready datasets for dashboards and decision-making.
Typical sources you’ll query
- Ad platforms: spend, clicks, impressions, campaign metadata
- Web analytics: sessions, utm_ parameters, landing pages
- CRM: leads, deals/opportunities, revenue events
- Reference tables: channels map, campaign naming rules, currencies
Dialect note
Examples use ANSI-like SQL with functions such as DATE_TRUNC and INTERVAL. If you use BigQuery, Snowflake, Redshift, or Postgres, syntax will be very similar. Adjust minor differences (e.g., DATETIME vs TIMESTAMP, DATE_TRUNC vs DATE_TRUNC('month', ...)).
What you will be able to do
- Pull spend, clicks, impressions, and conversions by date, channel, and campaign
- Join ad platform data with CRM to measure CAC, ROAS, and LTV
- Build segmentation and cohort tables for acquisition analysis
- Prepare attribution-ready datasets (with windows and touchpoints)
- Run data quality checks for tracking and naming conventions
Who this is for
- Marketing Analysts who need reliable performance answers from raw data
- Marketers transitioning to data-informed decision-making
- Early-career analysts preparing for hands-on dashboards and experimentation
Prerequisites
- Comfort with basic math (ratios, percentages, averages)
- Familiarity with marketing terms: campaign, channel, UTM, CAC, LTV, ROAS
- Very basic SQL concepts (tables, rows, columns). If brand new, skim SELECT, WHERE, GROUP BY first.
Learning path
- Understand your marketing schema
- Locate ad, web analytics, and CRM tables
- List keys to join (e.g., click_id, gclid/fbclid, email, session_id)
- Note date columns and time zones
- Pull core performance metrics
- SELECT date, channel, SUM(spend), SUM(clicks), SUM(impressions)
- Compute CTR, CPC, CPM carefully (avoid integer division)
- Join marketing to CRM
- Connect leads/opportunities to campaigns
- Calculate CAC and revenue per cohort or channel
- Prepare attribution-ready datasets
- Create touchpoint views and sessionization
- Apply lookback windows (e.g., 7-day post-click)
- Cohorts, LTV, and segmentation
- Build first-touch acquisition cohorts
- Track LTV and retention over time
- Data quality checks
- Detect missing UTM fields, duplicates, negative spend
- Validate joins and row counts
Worked examples (copy, run, adapt)
1) Daily channel performance: spend, clicks, CTR, CPC, CPM
-- Inputs: ad_perf(day, channel, campaign, spend, clicks, impressions)
SELECT
day,
channel,
SUM(spend) AS spend,
SUM(clicks) AS clicks,
SUM(impressions) AS impressions,
CASE WHEN SUM(impressions) > 0 THEN SUM(clicks)::decimal / SUM(impressions) ELSE 0 END AS ctr,
CASE WHEN SUM(clicks) > 0 THEN SUM(spend) / SUM(clicks) ELSE NULL END AS cpc,
CASE WHEN SUM(impressions) > 0 THEN (SUM(spend) * 1000.0) / SUM(impressions) ELSE NULL END AS cpm
FROM ad_perf
WHERE day BETWEEN DATE '2025-01-01' AND DATE '2025-01-31'
GROUP BY day, channel
ORDER BY day, channel;Why it works
Aggregate first, then compute derived metrics from the aggregates to avoid row-level bias and integer division.
2) Join ad clicks with CRM leads
-- Inputs: ad_clicks(click_id, click_time, channel, campaign), crm_leads(lead_id, click_id, email, created_at)
SELECT
l.lead_id,
l.email,
l.created_at,
c.channel,
c.campaign
FROM crm_leads l
LEFT JOIN ad_clicks c
ON l.click_id = c.click_id; -- keep all leads, even if click is missingWhy LEFT JOIN?
You want all leads present, even when click_id is blank or unmatched, to spot attribution gaps.
3) Attribution-ready: conversions within 7 days after click
-- Inputs: ad_clicks(click_id, user_id, click_time, channel, campaign)
-- conversions(user_id, order_id, order_time, revenue)
WITH touch AS (
SELECT
c.user_id,
c.channel,
c.campaign,
c.click_time,
v.order_id,
v.order_time,
v.revenue
FROM ad_clicks c
JOIN conversions v
ON v.user_id = c.user_id
AND v.order_time BETWEEN c.click_time AND c.click_time + INTERVAL '7 days'
)
SELECT
DATE_TRUNC('day', click_time) AS click_day,
channel,
campaign,
COUNT(DISTINCT order_id) AS conv,
SUM(revenue) AS revenue
FROM touch
GROUP BY 1,2,3
ORDER BY 1,2,3;Notes
- Adjust the window based on your attribution policy
- If multiple clicks occur, you may need ROW_NUMBER to assign credit
4) Campaign and channel segmentation
-- Inputs: ad_perf(day, campaign, channel, device, country, spend, clicks, conversions)
SELECT
channel,
device,
country,
SUM(spend) AS spend,
SUM(clicks) AS clicks,
SUM(conversions) AS conversions,
CASE WHEN SUM(clicks) > 0 THEN SUM(spend) / SUM(clicks) END AS cpc
FROM ad_perf
WHERE day >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY channel, device, country
ORDER BY spend DESC;5) Acquisition cohort table
-- Inputs: users(user_id, first_touch_at), revenue_events(user_id, event_time, amount)
WITH cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', first_touch_at) AS cohort_month
FROM users
), revenue AS (
SELECT
r.user_id,
DATE_TRUNC('month', r.event_time) AS revenue_month,
SUM(r.amount) AS revenue
FROM revenue_events r
GROUP BY 1,2
)
SELECT
c.cohort_month,
r.revenue_month,
EXTRACT(MONTH FROM AGE(r.revenue_month, c.cohort_month)) AS months_since_acq,
SUM(r.revenue) AS revenue
FROM cohorts c
JOIN revenue r USING (user_id)
GROUP BY 1,2,3
ORDER BY 1,2;How to read it
Each row shows how much revenue was generated by a cohort in each subsequent month. Use it to estimate LTV curves.
6) CAC and ROAS from joined data
-- Inputs: ad_perf(day, channel, campaign, spend)
-- crm_leads(lead_id, created_at, channel, campaign)
-- deals(lead_id, closed_at, revenue)
WITH spend AS (
SELECT
DATE_TRUNC('month', day) AS month,
channel,
campaign,
SUM(spend) AS spend
FROM ad_perf
GROUP BY 1,2,3
), acq AS (
SELECT
DATE_TRUNC('month', created_at) AS month,
channel,
campaign,
COUNT(*) AS leads
FROM crm_leads
GROUP BY 1,2,3
), rev AS (
SELECT
DATE_TRUNC('month', d.closed_at) AS month,
l.channel,
l.campaign,
SUM(d.revenue) AS revenue
FROM deals d
JOIN crm_leads l USING (lead_id)
GROUP BY 1,2,3
)
SELECT
COALESCE(s.month, a.month, r.month) AS month,
COALESCE(s.channel, a.channel, r.channel) AS channel,
COALESCE(s.campaign, a.campaign, r.campaign) AS campaign,
s.spend,
a.leads,
r.revenue,
CASE WHEN a.leads > 0 AND s.spend IS NOT NULL THEN s.spend / a.leads END AS cac,
CASE WHEN s.spend > 0 AND r.revenue IS NOT NULL THEN r.revenue / s.spend END AS roas
FROM spend s
FULL OUTER JOIN acq a USING (month, channel, campaign)
FULL OUTER JOIN rev r USING (month, channel, campaign)
ORDER BY month, channel, campaign;Tip
FULL OUTER JOIN helps surface mismatches across sources (e.g., spend without leads).
Drills and exercises
- [ ] Write a query to return last 14 days of spend, clicks, and CTR by channel
- [ ] Join leads to clicks and count how many leads have NULL click_id
- [ ] Build a 7-day attribution table and compare conversion counts to 1-day
- [ ] Create monthly cohorts and compute revenue in months 0–3
- [ ] Write checks to find negative spend or impressions = 0 with clicks > 0
- [ ] Calculate CAC per campaign and sort by worst to best
Common mistakes and debugging tips
Integer division and wrong aggregates
Compute ratios from aggregated sums, not from row-level values. Cast to decimal or multiply by 1.0 to avoid truncation.
Incorrect join direction
If analyzing leads, keep crm_leads on the left with LEFT JOIN to ad data. Reversing can silently drop rows.
Time zone drift
Clicks and orders may be stored in different time zones. Normalize with AT TIME ZONE or convert to UTC before windowing.
Messy campaign naming
Create a mapping table to standardize channel and campaign names before grouping; otherwise you split metrics across variants.
Leaky attribution windows
Always bound your window (e.g., BETWEEN click_time AND click_time + INTERVAL '7 days') and decide how to handle multiple clicks.
Mini project: Marketing Performance Pack
Goal: produce a small, reliable dataset and summary views the team can use immediately.
- Source audit
- List tables and keys (click_id, user_id, email)
- Note date columns and time zones
- Core metrics view
- daily_channel_perf(day, channel, spend, clicks, impressions, ctr, cpc, cpm)
- Lead join
- lead_with_source(lead_id, created_at, channel, campaign, has_click)
- Attribution view
- touch_7d with click-to-order matches and revenue
- Cohort table
- cohort_revenue(cohort_month, revenue_month, months_since_acq, revenue)
- Data quality checks
- missing_utms, duplicate_leads, negative_spend
Deliverables checklist
- [ ] SQL scripts saved and commented
- [ ] Views created or CTEs reproducible
- [ ] A short readme describing logic and assumptions
Subskills
- Marketing Data Schemas Understanding
- Pulling Spend Clicks Impressions Conversions
- Joining Ad Platform Data With CRM
- Campaign And Channel Segmentation Queries
- Attribution Ready Data Prep
- Cohort Tables For Acquisition
- LTV And CAC Queries
- Data Quality Checks For Tracking
Practical projects
- Channel scorecard: weekly update of spend, CAC, ROAS by channel and device
- Cohort tracker: monthly cohort LTV curves with retention notes
- Attribution comparison: compare 1-day, 7-day, and 28-day windows to show sensitivity
Next steps
- Learn dashboarding (e.g., your team’s BI tool) to visualize SQL outputs
- Add experimentation basics (A/B test metrics, guardrails)
- Automate with scheduled queries or templated views for repeatability
- Optional: light Python/Notebooks to enrich SQL outputs with modeling