luvv to helpDiscover the Best Free Online Tools
Topic 1 of 8

Cleaning Data In Sheets

Learn Cleaning Data In Sheets for free with explanations, exercises, and a quick test (for Marketing Analyst).

Published: December 22, 2025 | Updated: December 22, 2025

Why this matters

Marketing Analysts constantly join exports from ad platforms, web analytics, and CRM. Raw data arrives messy: mixed case, extra spaces, inconsistent dates, duplicate leads, and irregular campaign names. Cleaning quickly and correctly means:

  • Trustworthy conversion rates, CAC, and ROAS.
  • Consistent channel and campaign naming for rollups.
  • Accurate attribution and cohort analyses.
  • Faster reporting with fewer manual fixes.

Concept explained simply

Data cleaning is turning inconsistent inputs into consistent, analysis-ready columns without changing the meaning. You standardize text, convert types (text to dates/numbers), handle missing values, fix categories, and remove duplicates.

Mental model

  • See each column as a contract: one meaning, one format.
  • Use repeatable formulas so new data cleans itself.
  • Move left-to-right: detect → standardize → validate.
Common cleaning functions at a glance
  • Whitespace/characters: TRIM, CLEAN, SUBSTITUTE, REGEXREPLACE
  • Case/format: LOWER, UPPER, PROPER, TEXT
  • Split/merge: SPLIT (or TEXTSPLIT), TEXTJOIN
  • Dates/numbers: DATEVALUE, VALUE, NUMBERVALUE
  • Lookup/mapping: VLOOKUP/XLOOKUP, IF, SWITCH
  • Distinct/filter: UNIQUE, FILTER, SORT, REMOVE DUPLICATES (menu)

Step-by-step cleaning workflow

  1. Copy raw data to a Raw sheet; do not edit it directly.
  2. Build a Clean sheet with formulas referencing Raw.
  3. Standardize text: =LOWER(TRIM(A2)), remove noise with REGEXREPLACE.
  4. Parse and cast types: convert to valid dates and numbers.
  5. Map categories via lookup tables (e.g., channel groupings).
  6. Remove/flag duplicates and impossible values.
  7. Add validation columns (e.g., ISBLANK, ISNUMBER) for self-check.

Worked examples

1) Standardize UTM campaign and medium

Goal: convert varied inputs into consistent fields.

Raw!A: medium            Raw!B: campaign
" CPC  "               "Black Friday - US"
"cpc"                   "black-friday_us"
"Paid Social"           "BlackFriday US"
"(not set)"             "BlackFriday-US"
"CPC"                   "BF us"

In Clean!C (medium_std):

=LET(m,LOWER(TRIM(Raw!A2)),IF(m="(not set)","",IF(OR(m="cpc",m="ppc"),"cpc",IF(m="paid social","paid_social",m))))

In Clean!D (campaign_key):

=LOWER(REGEXREPLACE(TRIM(Raw!B2),"[^a-z0-9]+","_"))

Optional: remove leading/trailing underscores:

=LOWER(REGEXREPLACE(REGEXREPLACE(TRIM(Raw!B2),"[^a-z0-9]+","_"),"^_+|_+$",""))

2) Convert mixed date strings to real dates

Raw!C: date_text
"2024/11/30"
"11-30-2024"
"30 Nov 2024"
"2024.11.30"
"Nov-30-24"

Clean!E (date_value):

=DATEVALUE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(Raw!C2,".","/"),"-","/"),"  "," "))

This normalizes separators to "/" so DATEVALUE can parse. Format Clean!E as Date.

If some rows still fail

Wrap with an IFERROR fallback, e.g. try parsing textual months:

=IFERROR(DATEVALUE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(Raw!C2,".","/"),"-","/"),"  "," ")),IFERROR(VALUE(Raw!C2),""))

3) Convert currency-like text to numbers

Raw!D: amount_text
"$1,200.50"
"€ 900,00"
"1 500"
"USD2,345"
"£0.99"

Clean!F (amount_num):

=LET(t,REGEXREPLACE(Raw!D2,"[^0-9,\.\-]",""), dec, IF(REGEXMATCH(t,",\d{2}$"),",","."), grp, IF(dec=",",".",","), NUMBERVALUE(t, dec, grp))

This removes currency letters/spaces, detects decimal separator, and converts to a number.

4) De-duplicate leads by email, keep latest

Sort Raw by Date desc, then keep the first occurrence of each email.

Formula approach in Sheets (array):

=LET(s,SORT(Raw!A2:D, Raw!D2:D, FALSE), 
 emails, INDEX(s,,2), 
 keep, MATCH(emails, emails, 0)=ROW(emails)-MIN(ROW(emails))+1, 
 FILTER(s, keep))

Explanation: sort by date desc, then keep the first time each email appears.

Common mistakes and how to self-check

  • Leaving hidden spaces: fix with TRIM and confirm using LEN(value).
  • Breaking joins due to case differences: always LOWER both sides before lookups.
  • Dates stored as text: check with ISNUMBER(date_cell) after DATEVALUE.
  • Over-cleaning: removing useful characters (e.g., hyphens in SKUs). Define allowed chars per column.
  • Accidental duplicates after sort: re-check with COUNTIF(email_range, email)=1.
Self-check checklist
  • All key ID columns pass uniqueness checks.
  • All date columns return TRUE for ISNUMBER.
  • No leading/trailing spaces: LEN(x)=LEN(TRIM(x)).
  • Category values match a controlled list (via validation or lookup).
  • Aggregations before vs. after cleaning differ only where expected.

Exercises

These mirror the tasks below the lesson. Do them in a new Sheet. Aim to solve using formulas so your steps are repeatable.

Exercise 1: Standardize UTM fields

Input (paste into A1:B6):

medium	campaign
" CPC  "	"Black Friday - US"
"cpc"	"black-friday_us"
"Paid Social"	"BlackFriday US"
"(not set)"	"BlackFriday-US"
"CPC"	"BF us"
  • Create columns C (medium_std) and D (campaign_key).
  • medium_std rules: trim + lower; map "ppc" and any form of "cpc" to "cpc"; map "paid social" to "paid_social"; blank for "(not set)".
  • campaign_key: trim, lower, replace any non-alphanumeric with a single underscore, remove leading/trailing underscores.

Exercise 2: Parse dates and amounts

Input (paste into A1:B6):

date_text	amount_text
2024/11/30	$1,200.50
11-30-2024	€ 900,00
30 Nov 2024	1 500
2024.11.30	USD2,345
Nov-30-24	£0.99
  • Create columns C (date_value) and D (amount_num).
  • Convert all date_text to true dates.
  • Convert all amount_text to numeric values regardless of currency symbol or thousands delimiter.
Exercise checklist
  • All standardized mediums are one of: cpc, paid_social, or blank when not set.
  • All campaign keys are lowercase with underscores only.
  • ISNUMBER(C2:C) and ISNUMBER(D2:D) return TRUE.
  • No leading/trailing underscores in campaign keys.

Practical projects

  • Campaign naming cleaner: Input raw UTM exports; output standardized medium/source/campaign, with a lookup tab for mappings.
  • Leads deduper: Merge form and CRM exports, keep latest per email, flag conflicts (e.g., different country recorded).
  • Revenue prep sheet: Convert currency-like text to numbers, align to a unified currency column after conversion (assume conversion rates are handled elsewhere), and validate totals.

Mini challenge

Given product codes in A2:A, like "sku-123_a", " SKU 123 A ", "SKU-123A!!", create a formula that outputs standardized keys like "sku_123a" (lowercase, underscores, alphanumerics only). Hint: combine LOWER, TRIM, and REGEXREPLACE.

One-line answer
=LOWER(REGEXREPLACE(TRIM(A2),"[^a-z0-9]+","_"))

Who this is for

  • Marketing Analysts cleaning exports from ads, analytics, and CRM.
  • Anyone building recurring dashboards who wants consistent inputs.

Prerequisites

  • Basic spreadsheet navigation and formula entry.
  • Comfort with references (A1 and absolute references), sorting, and filtering.

Learning path

  1. Master text cleanup (TRIM, CLEAN, LOWER, SUBSTITUTE).
  2. Learn regex basics for REGEXREPLACE/REGEXEXTRACT.
  3. Parse types: DATEVALUE, VALUE, NUMBERVALUE.
  4. Category mapping with lookups and validation.
  5. Deduping and QA checks at the end of your pipeline.

Common pitfalls

  • Mixing manual edits with formulas. Keep a separate Clean sheet.
  • One-off fixes. Prefer formula patterns you can reuse.
  • Ambiguous date formats. Normalize separators before parsing.

Next steps

  • Turn your cleaning steps into a documented template.
  • Add data validation drop-downs to prevent new messes.
  • Practice on a new export each week to build speed.

Test availability note: The quick test is available to everyone; only logged-in users get saved progress.

Practice Exercises

2 exercises to complete

Instructions

Paste this into A1:B6:

medium	campaign
" CPC  "	"Black Friday - US"
"cpc"	"black-friday_us"
"Paid Social"	"BlackFriday US"
"(not set)"	"BlackFriday-US"
"CPC"	"BF us"
  • In C2, create medium_std: trim + lower; map any "cpc"/"ppc" to "cpc", "paid social" to "paid_social"; blank for "(not set)".
  • In D2, create campaign_key: trim, lower, replace non-alphanumerics with underscores, remove leading/trailing underscores.
  • Fill down and check no leading/trailing underscores remain.
Expected Output
Column C contains only cpc, paid_social, or blank for the (not set) row. Column D values look like black_friday_us and bf_us (lowercase underscores only).

Cleaning Data In Sheets — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Cleaning Data In Sheets?

AI Assistant

Ask questions about this tool