Why this matters
Data rarely arrives clean. As a Data Analyst, you will standardize names, generate emails, split product codes, extract IDs from URLs, and clean survey text. Text functions help you do this fast and reproducibly without manual edits.
- Marketing lists: Clean names, fix casing, and create
first.last@company.com. - Operations: Split SKUs into category, item number, and color.
- Web analytics: Extract campaign IDs from long URLs.
- Finance: Normalize vendor names and remove extra spaces.
Concept explained simply
Text functions treat each cell as a string (a sequence of characters). You can count characters, grab parts, find positions, replace characters, and change case. Combine them to build powerful cleaning pipelines.
Mental model
- Locate first, then extract: Use
FIND/SEARCHto get positions. Then useLEFT/MID/RIGHT. - Clean before compare:
TRIM+CLEAN+ consistent case (LOWER/UPPER/PROPER). - Replace, then split: Use
SUBSTITUTEorSPLITwhen possible; fall back toMIDwith positions. - Join smartly:
TEXTJOINcan ignore blanks;&orCONCATis simple for two items.
Commonly used functions (quick reference)
- Clean/case:
TRIM,CLEAN,UPPER,LOWER,PROPER - Find/length:
FIND(case-sensitive),SEARCH(case-insensitive),LEN - Extract:
LEFT,RIGHT,MID - Replace/split/join:
SUBSTITUTE,REPLACE,SPLIT(Sheets),TEXTSPLIT(new Excel),TEXTJOIN,CONCAT,& - Format/convert:
TEXT,VALUE
Worked examples
Example 1: Clean a full name
Cell A2: " mARY annE o'NEIL "
- Remove extra spaces:
TRIM(A2) - Fix non-printables (if any):
CLEAN(TRIM(A2)) - Proper case:
PROPER(CLEAN(TRIM(A2)))→ Mary Anne O'Neil
Mental check: Use LEN before and after cleaning to confirm space reduction.
Example 2: Extract domain from an email
Cell B2: alex.cho@contoso-analytics.com
- Position of
@:FIND("@",B2) - Domain:
MID(B2, FIND("@",B2)+1, LEN(B2)-FIND("@",B2))→ contoso-analytics.com
Note: Use SEARCH instead of FIND if case-insensitivity matters.
Example 3: Split SKU parts
Cell C2: CAT-0142-BLUE
- Category:
LEFT(C2, FIND("-",C2)-1)→ CAT - Item number:
MID(C2, FIND("-",C2)+1, FIND("-",C2, FIND("-",C2)+1) - FIND("-",C2) - 1)→ 0142 - Color:
RIGHT(C2, LEN(C2) - FIND("-",C2, FIND("-",C2)+1))→ BLUE
Tip: The third argument in FIND lets you search from a position to locate the second hyphen.
Who this is for
- Aspiring and junior Data Analysts who clean datasets in Excel or Google Sheets.
- Anyone preparing datasets for BI tools or basic modeling.
Prerequisites
- Basic spreadsheet navigation and formulas (
=to start, referencing cells). - Comfort with relative/absolute references (e.g.,
A2vs$A$2).
Exercises and practice tasks
Try these hands-on tasks. Use the hints if you get stuck. Solutions are collapsible.
Exercise 1: Clean and parse product codes
Set up a sheet with columns: A: Raw_SKU. Sample rows:
cat-001 blueCAT-002-greenCat-003 RED
Goals:
- In B: Clean_SKU → remove extra spaces and standardize to uppercase hyphenated form (e.g.,
CAT-001-BLUE). - In C: Category → text before first
-. - In D: Number → middle part (3 digits).
- In E: Color → last part, uppercase.
Hints
- Start with
TRIMandUPPER. - Use
SUBSTITUTEto normalize separators (spaces to hyphens). - Find hyphen positions with
FINDand extract viaLEFT/MID/RIGHT.
Show solution (summary)
In B2: UPPER(SUBSTITUTE(TRIM(SUBSTITUTE(A2," ","-")),"--","-")) (may add nested SUBSTITUTE to collapse multiple dashes)
In C2: LEFT(B2, FIND("-",B2)-1)
In D2: MID(B2, FIND("-",B2)+1, FIND("-",B2, FIND("-",B2)+1) - FIND("-",B2) - 1)
In E2: RIGHT(B2, LEN(B2) - FIND("-",B2, FIND("-",B2)+1))
Exercise 2: Generate emails from names
Columns: A: First, B: Last. Example: " Ana-Maria " and " O'Connor "
Goal: In C: Email → first.last@yourco.com all lowercase, remove spaces/apostrophes/hyphens in the email parts.
Hints
- Clean names with
TRIM. - Lowercase with
LOWER. - Strip punctuation via nested
SUBSTITUTE. - Join with
TEXTJOINor&.
Show solution (summary)
Helper for first (optional in D2): LOWER(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(A2),"'",""),"-","")," ",""))
Helper for last (optional in E2): same approach on B2.
All-in-one (in C2): LOWER(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(A2),"'",""),"-","")," ","")) & "." & LOWER(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(B2),"'",""),"-","")," ","")) & "@yourco.com"
Self-check checklist
- LEN decreased after cleaning when expected.
- No double separators (e.g.,
--or..). - Case is consistent (all lower or proper case).
- Extraction formulas still work if cell content slightly varies (try 1–2 variations).
Common mistakes and how to self-check
- Using
FINDwhen case-insensitivity is needed. Fix: UseSEARCH. - Forgetting to
TRIMbefore comparisons. Fix: Wrap inputs withTRIMand, if needed,CLEAN. - Hardcoding positions in
MID. Fix: Compute positions withFIND/SEARCHso formulas generalize. - Nesting many
SUBSTITUTEcalls without order. Fix: Remove broader issues first (spaces), then specific punctuation. - Joining text that includes blanks. Fix: Use
TEXTJOINwithignore_empty=TRUE(where available).
Practical projects
- Customer list cleanup: Standardize name casing, remove extra spaces, and generate emails. Deliver a cleaned sheet and a summary of rules used.
- SKU dictionary: Parse 200+ SKUs into components (category, model, color). Validate with a pivot table of categories and colors.
- UTM parser: From a list of URLs, extract source, medium, and campaign into separate columns using
FIND/MIDorSPLIT/TEXTSPLIT.
Learning path
Practice
TRIM, CLEAN, PROPER, LOWER, UPPER.Use
FIND/SEARCH with LEN, then LEFT/MID/RIGHT.Apply
SUBSTITUTE, REPLACE, SPLIT/TEXTSPLIT.Use
TEXTJOIN, CONCAT, &, and TEXT/VALUE for conversions.Build reusable cleaning formulas and test on new data.
Mini challenge
You receive a column Campaign with values like Q1_2025-SEM-Google_US and Q2_2025-Email-Newsletter_UK. Create formulas that output:
- Quarter (e.g.,
Q1_2025) - Channel (e.g.,
SEMorEmail) - Source (e.g.,
GoogleorNewsletter) - Country (e.g.,
USorUK)
Constraint: Assume separators are underscores and hyphens as shown, but lengths vary. Aim to use FIND/SEARCH + LEFT/MID/RIGHT so it generalizes.
One possible approach (peek)
Quarter: up to first - → LEFT(A2, FIND("-",A2)-1)
Country: after last _ → if available, RIGHT(A2, LEN(A2)-FIND("_",A2, FIND("_",A2)+1)) or use a helper to find the last underscore.
Next steps
- Apply these functions to a real dataset you own (contacts, products, or logs).
- Document your cleaning rules in a separate sheet tab so others can reuse them.
- Take the quick test to check retention. The test is available to everyone; only logged-in users get saved progress.