luvv to helpDiscover the Best Free Online Tools
Topic 4 of 12

Data Type Fixes

Learn Data Type Fixes for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 19, 2025 | Updated: December 19, 2025

Why this matters

Data type fixes turn messy strings into usable numbers, dates, booleans, and categories. As a Data Analyst, you will regularly:

  • Calculate KPIs from currency fields stored as text (e.g., "$1,234.50").
  • Merge datasets where dates use different formats (e.g., 31-12-2024 vs 12/31/2024).
  • Filter on boolean flags expressed as "Y/N", "yes/no", or 1/0.
  • Group by standardized categories and codes (e.g., country codes).

Correct types make calculations correct, joins reliable, and dashboards stable.

Concept explained simply

A data type describes how a value behaves: numbers add up, dates know months and years, booleans are true/false, and categories have a fixed set of labels. If the type is wrong, tools treat the value as plain text, and math/logic will fail or be inaccurate.

Mental model: the casting gate

Imagine a gate that checks every value and either converts it cleanly or rejects it. Your job is to design rules so most values pass the gate and rejected values are flagged for review. Rules include trimming spaces, removing symbols, standardizing formats, and choosing the right type.

Common types and how to fix

Numbers (currency, percentages, units)
  • Remove non-numeric symbols: currency ($, €, £), commas, spaces, percentage signs, unit strings.
  • Handle negatives: both leading '-' and parentheses (e.g., (123.45) -> -123.45).
  • Normalize decimal separators (dot vs comma) using locale-aware parsing or replacements.
  • Cast with a safe function (e.g., TRY_ casts) and review any values that fail.
Examples
Excel/Sheets:
1) Clean currency:
=VALUE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"$",""),",","")," ",""))

2) Parentheses negatives:
=IF(LEFT(A2,1)="(",-VALUE(SUBSTITUTE(SUBSTITUTE(MID(A2,2,LEN(A2)-2),",",""),"$","")),
     VALUE(SUBSTITUTE(SUBSTITUTE(A2,",",""),"$","")))
Python (pandas):
import re
s = df['price_str'].str.replace(r"[\$,\s]","", regex=True)
s = s.str.replace(r"\((.*)\)", r"-\1", regex=True)
df['price'] = pd.to_numeric(s, errors='coerce')
PostgreSQL:
-- Remove $ and commas, handle parentheses
SELECT CASE WHEN price_str ~ '^\(.*\)$'
            THEN -1 * (REPLACE(REPLACE(substr(price_str,2,length(price_str)-2),',',''),'$',''))::numeric
            ELSE (REPLACE(REPLACE(price_str,',',''),'$',''))::numeric
       END AS price
FROM t;
Dates and times
  • Unify format: explicitly define the pattern (e.g., YYYY-MM-DD).
  • Resolve ambiguity: decide whether day comes first (day-first vs month-first).
  • Handle time zones by converting to a standard (often UTC) when needed.
  • Flag impossible dates (e.g., 31/02/2024) for review.
Examples
Excel/Sheets:
=DATEVALUE(TEXT(A2,"yyyy-mm-dd"))  -- after first normalizing text to a consistent pattern
-- Or: Data > Split/Text-to-Columns with a defined date format
Python (pandas):
df['dt'] = pd.to_datetime(df['date_str'], errors='coerce', dayfirst=True)
# or provide exact format when known
df['dt_iso'] = pd.to_datetime(df['iso_str'], format='%Y-%m-%d', errors='coerce')
PostgreSQL:
SELECT to_date(date_str, 'DD-MM-YYYY') AS dt
FROM t;
-- For timestamps: to_timestamp('31-12-2024 23:59','DD-MM-YYYY HH24:MI')
Booleans
  • Map common variants to True/False: 'y','yes','true','1' -> true; 'n','no','false','0' -> false.
  • Treat blanks or unexpected values as NULL/Unknown and review.
Examples
Excel/Sheets:
=IF(OR(LOWER(A2)="y",LOWER(A2)="yes",A2="1",LOWER(A2)="true"),TRUE,
 IF(OR(LOWER(A2)="n",LOWER(A2)="no",A2="0",LOWER(A2)="false"),FALSE,NA()))
Python (pandas):
map_true = {'y':True,'yes':True,'true':True,'1':True}
map_false = {'n':False,'no':False,'false':False,'0':False}
low = df['flag'].astype(str).str.strip().str.lower()
df['is_active'] = low.map({**map_true, **map_false})
PostgreSQL:
SELECT CASE WHEN lower(flag) IN ('y','yes','true','1') THEN true
            WHEN lower(flag) IN ('n','no','false','0') THEN false
            ELSE NULL END AS is_active
FROM t;
Categories (enums, codes)
  • Standardize labels and casing (e.g., 'US', 'U.S.', 'United States' -> 'US').
  • Use a reference list/dictionary; log unknowns for review.
Examples
Excel/Sheets:
=SWITCH(UPPER(SUBSTITUTE(A2,".","")),
 "US","US","UNITED STATES","US","USA","US",
 "UK","UK","UNITED KINGDOM","UK",
 A2)  -- Fallback to original
Python (pandas):
canon = {'us':'US','u.s.':'US','united states':'US','usa':'US'}
val = df['country'].str.replace('.','', regex=False).str.strip().str.lower()
df['country_std'] = val.map(canon).fillna(df['country'])
PostgreSQL:
SELECT CASE WHEN replace(lower(country),'.','') IN ('us','united states','usa') THEN 'US'
            WHEN replace(lower(country),'.','') IN ('uk','united kingdom') THEN 'UK'
            ELSE country END AS country_std
FROM t;

Worked examples

Example 1: Currency string to number

Input: "$1,234.50", "(99.95)", " € 5 000 ". Goal: numeric values 1234.50, -99.95, 5000.00.

Show walkthrough
  1. Trim spaces.
  2. Detect parentheses to mark negatives.
  3. Remove currency symbols and separators.
  4. Cast to number; review any failures.
p = df['price_str'].str.strip()
p = p.str.replace(r"\((.*)\)", r"-\1", regex=True)
p = p.str.replace(r"[^0-9\-\.]","", regex=True)
df['price'] = pd.to_numeric(p, errors='coerce')

Example 2: Mixed date formats

Input: ["31-12-2024", "12/31/2024", "2024-12-31"]. Goal: one date column.

Show walkthrough
  1. Try explicit formats in order.
  2. Prefer day-first when region requires it.
  3. Coerce failures to NULL for review.
date = pd.to_datetime(df['d1'], format='%d-%m-%Y', errors='coerce')
date = date.fillna(pd.to_datetime(df['d1'], format='%m/%d/%Y', errors='coerce'))
date = date.fillna(pd.to_datetime(df['d1'], errors='coerce'))
df['date'] = date

Example 3: Y/N to boolean

Input: ['Y','N','Yes','', '0','1','true','False'] -> True/False/NULL.

Show walkthrough
m = {'y':True,'yes':True,'1':True,'true':True,
     'n':False,'no':False,'0':False,'false':False}
low = df['flag'].astype(str).str.strip().str.lower()
df['flag_bool'] = low.map(m)

Practical steps (do this on every dataset)

  1. Profile columns: detect current types, unique values, min/max, nulls.
  2. Decide target type: number, date, boolean, category, or string.
  3. Write conversion rules: trims, symbol removal, mapping dictionaries, explicit formats.
  4. Apply safe casts: functions that return NULL on failure (e.g., errors='coerce', TRY_CAST).
  5. Quantify fallout: % of values that failed conversion. Investigate top offenders.
  6. Lock in: store cleaned columns with clear names (e.g., price_num, signup_date).

Exercises

These mirror the tasks below. Do them in your preferred tool (Excel/Sheets, SQL, or Python).

  • Exercise 1: Clean and cast currency strings to numbers.
  • Exercise 2: Parse mixed-format dates into one date column.
  • Exercise 3: Normalize boolean responses.

Checklist for your solutions

  • Spaces, symbols, and separators removed consistently.
  • Negatives handled, including parentheses.
  • Explicit date formats provided where possible.
  • Unknown boolean values set to NULL/NA (not silently converted).
  • Failure rate measured and reported.

Common mistakes and how to self-check

  • Mistake: Assuming all commas are thousands separators. Self-check: Count digits between separators; verify locale. Consider replacing only if pattern matches.
  • Mistake: Auto-parsing dates without specifying day-first. Self-check: Spot-check records where day and month both ≤ 12.
  • Mistake: Forcing booleans for every value. Self-check: Keep unknowns as NULL and report them.
  • Mistake: Losing negatives in parentheses. Self-check: Test with a sample that includes (123.45).
  • Mistake: Overwriting raw data. Self-check: Always create a new column for the converted value.

Practical projects

  • E-commerce snapshot: Convert price, discount %, and order_date to correct types; compute revenue and daily order counts.
  • Survey cleanup: Normalize yes/no/maybe flags; calculate completion rates by segment.
  • HR dataset: Standardize hire_date, termination_date, and employment_status; compute tenure and active headcount over time.

Learning path

  • Before this: Missing values handling and basic text cleaning (trim, lower, remove duplicates).
  • This subskill: Systematic conversion to numeric, date, boolean, and category types with validation.
  • Next: Data validation rules, outlier detection, and constraints to prevent future type drift.

Who this is for

  • Aspiring and junior Data Analysts cleaning CSVs, spreadsheets, or database tables.
  • Anyone preparing datasets for BI tools or machine learning features.

Prerequisites

  • Basic spreadsheet formulas or beginner knowledge of SQL/Python.
  • Comfort with simple string operations (trim, replace) and date formats.

Mini challenge

Given these values, produce 4 typed columns: price_num, signup_date, opted_in, country_std.

price_str: ["$1,290.00", "(75)", "EUR 5 100"]
date_str:  ["31-12-2024", "12/30/2024", "2024-12-29"]
opt_in:    ["Y", "no", "1"]
country:   ["U.S.", "United States", "UK"]

Rules: handle parentheses negatives, mixed dates, boolean mapping, and US/UK standardization. Report any failed parses.

Next steps

  • Apply these rules to one of your real datasets. Log the % of failed conversions.
  • Create small helper functions (or spreadsheet templates) you can reuse for numbers, dates, and booleans.
  • Take the quick test below to confirm mastery. Note: Everyone can take the test; only logged-in users will see saved progress.

Practice Exercises

3 exercises to complete

Instructions

Input values:

["$1,234.50", "(99.95)", " € 5 000 ", "CAD1,005", "N/A"]

Tasks:

  • Create price_num where currency symbols, spaces, and commas are removed.
  • Convert parentheses to negative numbers.
  • Cast to numeric; set invalid or missing entries to NULL/NA.
  • Report the failure rate (% of values that became NULL/NA).
Expected Output
price_num: [1234.50, -99.95, 5000.00, 1005.00, NULL]; failure_rate ≈ 1/5 = 20%

Data Type Fixes — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Data Type Fixes?

AI Assistant

Ask questions about this tool