luvv to helpDiscover the Best Free Online Tools
Topic 4 of 14

Data Cleaning in Sheets

Learn Data Cleaning in Sheets for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Why this matters

Example 4 — Deduplicate by keeping the latest per ID

Goal: for each OrderID, keep the latest date row.

OrderID in A2:A, Date in B2:B, CustID in C2:C

One-cell solution:

=SORTN( 
  SORT({A2:A, C2:C, ARRAYFORMULA(IFERROR(DATEVALUE(SUBSTITUTE(B2:B,"/","-"))))}, 1, TRUE, 3, FALSE), 
  9^9, 2, 1, TRUE 
)
  • SORT by OrderID asc and Date desc (third column).
  • SORTN keeps first occurrence per OrderID (column 1).

Step-by-step cleaning workflow

1) Create a staging sheet

  • Copy raw data to a new tab "stg_raw" (keep raw untouched).
  • Add helper columns for parsing and flags.

2) Normalize text

  • Remove stray characters: =ARRAYFORMULA(REGEXREPLACE(range,"[^\w '\-]",""))
  • Whitespace/case: =ARRAYFORMULA(PROPER(TRIM(range)))

3) Parse dates/numbers

  • Dates: =ARRAYFORMULA(IFERROR(DATEVALUE(SUBSTITUTE(date_col,"/","-")),""))
  • Numbers: =ARRAYFORMULA(VALUE(SUBSTITUTE(num_col,",","")))

4) Validate and deduplicate

  • Duplicate flag: =ARRAYFORMULA(COUNTIF(key_col,key_col)>1)
  • Keep latest per key: use SORT + SORTN as in Example 4.

5) Map categories

  • Create a mapping table (old → new).
  • Apply: =ARRAYFORMULA(IFERROR(VLOOKUP(old_col, map!A:B, 2, FALSE), old_col))

6) Output clean table

  • Select only clean columns in final order: =QUERY(clean_range, "select Col1, Col3, Col5", 0)
  • Freeze header, set formats, protect ranges if needed.
Reusable checklist
  • Raw preserved
  • Dates parse as numbers (check by changing format)
  • No blank keys
  • No dup keys (unless expected)
  • Categories match controlled list
  • All formulas error-handled (IFERROR)

Exercises

Do these directly in a new sheet. Mirror of the practice tasks below. Your outputs should match the expected results provided.

Exercise 1 — Clean names into a standardized column

Dataset (A1:A6):

A
Name
"  aNA   gomez  "
"LEE  CHEN!!!"
"o'connor"
" mary-jane  smith "
"Jo  (temp)  "

Task: In B2, write a single array formula to produce cleaned names with these rules: trim spaces, remove non-letters except spaces, apostrophes, and hyphens, and set Proper Case. Copy down via array (one formula only).

Expected result in B2:B6:

Ana Gomez
Lee Chen
O'Connor
Mary-Jane Smith
Jo Temp
Hint

Combine REGEXREPLACE, TRIM, and PROPER inside ARRAYFORMULA.

Exercise 2 — Standardize dates and deduplicate orders

Dataset (A1:C7):

OrderID | DateText     | CustomerID
1001    | 2024/07/01   | C-01
1002    | 1-7-2024     | C-02
1001    | Jul 03, 2024 | C-01
1003    | 2024-07-02   | C-03
1002    | 2024/07/05   | C-02

Tasks:

  • Create D2:D as StandardDate (true date values) from DateText using one array formula.
  • Produce a deduplicated table with the latest row per OrderID (keep columns: OrderID, CustomerID, StandardDate).

Expected result (latest per OrderID):

1001 | C-01 | 2024-07-03
1002 | C-02 | 2024-07-05
1003 | C-03 | 2024-07-02
Hint

First convert text to dates with DATEVALUE and SUBSTITUTE. Then use SORT + SORTN to keep the latest per ID.

Common mistakes and self-check

  • Mistake: Using Find & Replace to fix case and spaces manually. Fix: Use formulas (TRIM, PROPER) so updates auto-apply.
  • Mistake: Storing dates as text. Fix: Ensure the cell formats change when you switch to a number format; otherwise parse with DATEVALUE.
  • Mistake: VLOOKUP returning wrong matches. Fix: Use exact match (FALSE) and clean keys on both sides.
  • Mistake: Losing duplicates unintentionally. Fix: Flag dups with COUNTIF first; document dedup criteria.
  • Mistake: Unhandled errors break the table. Fix: Wrap with IFERROR to keep outputs stable.
Self-check routine
  • Switch date column to Plain text then back to Date—do values survive?
  • COUNTBLANK on key columns equals 0?
  • COUNTIF(key, key) maximum equals 1 after dedup?
  • Random spot-check 5 rows against raw.

Mini challenge

You receive a CSV export with product names like " ultra-phone X (Black) ", categories like "MOB, phone", and prices with commas "1,299.00". Build a staging sheet that outputs:

  • Clean ProductName: remove parentheses content, trim, Proper case.
  • Category: standardized to one of {"Phone","Tablet","Accessory"} via mapping.
  • Price: numeric.
Starter hints
  • Remove parentheses: =REGEXREPLACE(A2,"\s*\(.*\)","")
  • Price: =VALUE(SUBSTITUTE(price,",",""))
  • Map: =IFERROR(VLOOKUP(rawCat,map!A:B,2,FALSE),"Other")

Who this is for

  • Aspiring and junior Data Analysts.
  • Anyone cleaning CSV/Excel/Sheets exports for reporting.

Prerequisites

  • Basic spreadsheet navigation and formulas.
  • Comfort with ranges and absolute references.

Learning path

  • Start: TRIM, PROPER, SPLIT, TEXTJOIN.
  • Next: DATEVALUE, VALUE, error handling with IFERROR.
  • Then: REGEXREPLACE/REGEXEXTRACT for pattern work.
  • Finally: Deduping with SORT/SORTN, joining with VLOOKUP or INDEX/MATCH.

Practical projects

  • Clean a 1,000-row lead list: names, emails, phone numbers, dedupe by email.
  • Unify sales transactions from two marketplaces; standardize dates and categories; reconcile SKUs via a mapping tab.
  • Create a repeatable cleaning template with staging and output tabs for a monthly report.

Next steps

  • Build your personal cleaning template with sections for text normalization, date parsing, dedup, and mapping.
  • Practice on messy public datasets (exports you already have) and time yourself to improve speed.
  • Take the quick test below to confirm understanding.

Practice Exercises

2 exercises to complete

Instructions

Dataset (A1:A6):

A
Name
"  aNA   gomez  "
"LEE  CHEN!!!"
"o'connor"
" mary-jane  smith "
"Jo  (temp)  "

Task: In B2, write a single array formula to produce cleaned names with these rules: trim spaces, remove non-letters except spaces, apostrophes, and hyphens, and set Proper Case. Use one formula that fills down automatically.

Expected Output
Ana Gomez Lee Chen O'Connor Mary-Jane Smith Jo Temp

Data Cleaning in Sheets — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Data Cleaning in Sheets?

AI Assistant

Ask questions about this tool