How to learn Cleaning Data In Sheets for Spreadsheet Proficiency in Marketing Analyst for free

Why this matters

Marketing Analysts constantly join exports from ad platforms, web analytics, and CRM. Raw data arrives messy: mixed case, extra spaces, inconsistent dates, duplicate leads, and irregular campaign names. Cleaning quickly and correctly means:

Trustworthy conversion rates, CAC, and ROAS.
Consistent channel and campaign naming for rollups.
Accurate attribution and cohort analyses.
Faster reporting with fewer manual fixes.

Concept explained simply

Data cleaning is turning inconsistent inputs into consistent, analysis-ready columns without changing the meaning. You standardize text, convert types (text to dates/numbers), handle missing values, fix categories, and remove duplicates.

Mental model

See each column as a contract: one meaning, one format.
Use repeatable formulas so new data cleans itself.
Move left-to-right: detect → standardize → validate.

Common cleaning functions at a glance

Whitespace/characters: TRIM, CLEAN, SUBSTITUTE, REGEXREPLACE
Case/format: LOWER, UPPER, PROPER, TEXT
Split/merge: SPLIT (or TEXTSPLIT), TEXTJOIN
Dates/numbers: DATEVALUE, VALUE, NUMBERVALUE
Lookup/mapping: VLOOKUP/XLOOKUP, IF, SWITCH
Distinct/filter: UNIQUE, FILTER, SORT, REMOVE DUPLICATES (menu)

Step-by-step cleaning workflow

Copy raw data to a Raw sheet; do not edit it directly.
Build a Clean sheet with formulas referencing Raw.
Standardize text: =LOWER(TRIM(A2)), remove noise with REGEXREPLACE.
Parse and cast types: convert to valid dates and numbers.
Map categories via lookup tables (e.g., channel groupings).
Remove/flag duplicates and impossible values.
Add validation columns (e.g., ISBLANK, ISNUMBER) for self-check.

Worked examples

1) Standardize UTM campaign and medium

Goal: convert varied inputs into consistent fields.

Raw!A: medium            Raw!B: campaign
" CPC  "               "Black Friday - US"
"cpc"                   "black-friday_us"
"Paid Social"           "BlackFriday US"
"(not set)"             "BlackFriday-US"
"CPC"                   "BF us"

In Clean!C (medium_std):

=LET(m,LOWER(TRIM(Raw!A2)),IF(m="(not set)","",IF(OR(m="cpc",m="ppc"),"cpc",IF(m="paid social","paid_social",m))))

In Clean!D (campaign_key):

=LOWER(REGEXREPLACE(TRIM(Raw!B2),"[^a-z0-9]+","_"))

Optional: remove leading/trailing underscores:

=LOWER(REGEXREPLACE(REGEXREPLACE(TRIM(Raw!B2),"[^a-z0-9]+","_"),"^_+|_+$",""))

2) Convert mixed date strings to real dates

Raw!C: date_text
"2024/11/30"
"11-30-2024"
"30 Nov 2024"
"2024.11.30"
"Nov-30-24"

Clean!E (date_value):

=DATEVALUE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(Raw!C2,".","/"),"-","/"),"  "," "))

This normalizes separators to "/" so DATEVALUE can parse. Format Clean!E as Date.

If some rows still fail

Wrap with an IFERROR fallback, e.g. try parsing textual months:

=IFERROR(DATEVALUE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(Raw!C2,".","/"),"-","/"),"  "," ")),IFERROR(VALUE(Raw!C2),""))

3) Convert currency-like text to numbers

Raw!D: amount_text
"$1,200.50"
"€ 900,00"
"1 500"
"USD2,345"
"£0.99"

Clean!F (amount_num):

=LET(t,REGEXREPLACE(Raw!D2,"[^0-9,\.\-]",""), dec, IF(REGEXMATCH(t,",\d{2}$"),",","."), grp, IF(dec=",",".",","), NUMBERVALUE(t, dec, grp))

This removes currency letters/spaces, detects decimal separator, and converts to a number.

4) De-duplicate leads by email, keep latest

Sort Raw by Date desc, then keep the first occurrence of each email.

Formula approach in Sheets (array):

=LET(s,SORT(Raw!A2:D, Raw!D2:D, FALSE), 
 emails, INDEX(s,,2), 
 keep, MATCH(emails, emails, 0)=ROW(emails)-MIN(ROW(emails))+1, 
 FILTER(s, keep))

Explanation: sort by date desc, then keep the first time each email appears.

Common mistakes and how to self-check

Leaving hidden spaces: fix with TRIM and confirm using LEN(value).
Breaking joins due to case differences: always LOWER both sides before lookups.
Dates stored as text: check with ISNUMBER(date_cell) after DATEVALUE.
Over-cleaning: removing useful characters (e.g., hyphens in SKUs). Define allowed chars per column.
Accidental duplicates after sort: re-check with COUNTIF(email_range, email)=1.

Self-check checklist

All key ID columns pass uniqueness checks.
All date columns return TRUE for ISNUMBER.
No leading/trailing spaces: LEN(x)=LEN(TRIM(x)).
Category values match a controlled list (via validation or lookup).
Aggregations before vs. after cleaning differ only where expected.

Exercises

These mirror the tasks below the lesson. Do them in a new Sheet. Aim to solve using formulas so your steps are repeatable.

Exercise 1: Standardize UTM fields

Input (paste into A1:B6):

medium	campaign
" CPC  "	"Black Friday - US"
"cpc"	"black-friday_us"
"Paid Social"	"BlackFriday US"
"(not set)"	"BlackFriday-US"
"CPC"	"BF us"

Create columns C (medium_std) and D (campaign_key).
medium_std rules: trim + lower; map "ppc" and any form of "cpc" to "cpc"; map "paid social" to "paid_social"; blank for "(not set)".
campaign_key: trim, lower, replace any non-alphanumeric with a single underscore, remove leading/trailing underscores.

Exercise 2: Parse dates and amounts

Input (paste into A1:B6):

date_text	amount_text
2024/11/30	$1,200.50
11-30-2024	€ 900,00
30 Nov 2024	1 500
2024.11.30	USD2,345
Nov-30-24	£0.99

Create columns C (date_value) and D (amount_num).
Convert all date_text to true dates.
Convert all amount_text to numeric values regardless of currency symbol or thousands delimiter.

Exercise checklist

All standardized mediums are one of: cpc, paid_social, or blank when not set.
All campaign keys are lowercase with underscores only.
ISNUMBER(C2:C) and ISNUMBER(D2:D) return TRUE.
No leading/trailing underscores in campaign keys.

Practical projects

Campaign naming cleaner: Input raw UTM exports; output standardized medium/source/campaign, with a lookup tab for mappings.
Leads deduper: Merge form and CRM exports, keep latest per email, flag conflicts (e.g., different country recorded).
Revenue prep sheet: Convert currency-like text to numbers, align to a unified currency column after conversion (assume conversion rates are handled elsewhere), and validate totals.

Mini challenge

Given product codes in A2:A, like "sku-123_a", " SKU 123 A ", "SKU-123A!!", create a formula that outputs standardized keys like "sku_123a" (lowercase, underscores, alphanumerics only). Hint: combine LOWER, TRIM, and REGEXREPLACE.

One-line answer

=LOWER(REGEXREPLACE(TRIM(A2),"[^a-z0-9]+","_"))

Who this is for

Marketing Analysts cleaning exports from ads, analytics, and CRM.
Anyone building recurring dashboards who wants consistent inputs.

Prerequisites

Basic spreadsheet navigation and formula entry.
Comfort with references (A1 and absolute references), sorting, and filtering.

Learning path

Master text cleanup (TRIM, CLEAN, LOWER, SUBSTITUTE).
Learn regex basics for REGEXREPLACE/REGEXEXTRACT.
Parse types: DATEVALUE, VALUE, NUMBERVALUE.
Category mapping with lookups and validation.
Deduping and QA checks at the end of your pipeline.

Common pitfalls

Mixing manual edits with formulas. Keep a separate Clean sheet.
One-off fixes. Prefer formula patterns you can reuse.
Ambiguous date formats. Normalize separators before parsing.

Next steps

Turn your cleaning steps into a documented template.
Add data validation drop-downs to prevent new messes.
Practice on a new export each week to build speed.

Test availability note: The quick test is available to everyone; only logged-in users get saved progress.

Menu

Cleaning Data In Sheets

Table of Contents