luvv to helpDiscover the Best Free Online Tools
Topic 1 of 31

Data Types and Casting

Learn Data Types and Casting for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 19, 2025 | Updated: December 19, 2025

Who this is for

Current and aspiring Data Analysts who write SQL to clean data, join tables, calculate metrics, and build reliable dashboards.

Prerequisites

  • Basic SQL: SELECT, WHERE, GROUP BY.
  • Comfort reading simple schemas and sample rows.

Why this matters

  • Joins fail or become slow if key columns have mismatched types (e.g., VARCHAR vs INT).
  • Revenue and rates must use precise numeric types; the wrong type causes rounding errors.
  • Dates and timestamps require correct parsing and time-aware casting to get accurate trends.
  • Dashboards depend on consistent, typed data—casting makes messy inputs analytics-ready.

Concept explained simply

Data types (the essentials)

  • Integers: SMALLINT, INT/INTEGER, BIGINT.
  • Exact decimals: DECIMAL/NUMERIC(p,s) for money and precise rates.
  • Approximate decimals: FLOAT/REAL/DOUBLE—fast but can introduce tiny precision errors.
  • Text: CHAR(n), VARCHAR(n), TEXT.
  • Dates/Times: DATE, TIME, TIMESTAMP (with/without time zone depending on dialect).
  • Boolean: TRUE/FALSE (some systems use 1/0 under the hood).
  • NULL: the absence of a value; it propagates through most operations.

Casting (changing the container)

  • Implicit casting: the database automatically converts types when it can. Risk: surprises and slower queries.
  • Explicit casting: you tell the database exactly what type you want using CAST(expr AS type). Clear and safer.
-- Generic, portable syntax
CAST('123' AS INT)
CAST('2024-06-30' AS DATE)
CAST(9.99 AS DECIMAL(12,2))
Dialect notes (helpful, but you can stick to CAST)
  • PostgreSQL: expr::type (e.g., '123'::int).
  • SQL Server: CONVERT(type, expr) and TRY_CONVERT; also TRY_CAST.
  • BigQuery: SAFE_CAST(expr AS type) returns NULL instead of error.
  • Date parsing formats vary: to_date (Postgres), PARSE_DATE (BigQuery), CONVERT with style codes (SQL Server).

Mental model

Think of each type as a container shaped for specific data. Casting is pouring data into a new container. If the shapes fit, you keep all the data. If not, you might spill (truncate), round off, or the pour fails (error). Choose the right container before doing math or joins.

Safety patterns

  • Prefer explicit CAST for clarity and reproducibility.
  • Use fixed-precision DECIMAL/NUMERIC for currency and precise rates.
  • Guard risky casts with safe variants (TRY_CAST/TRY_CONVERT/SAFE_CAST) or validate strings first.
  • Normalize messy strings: REPLACE/REGEXP_REPLACE to remove symbols/commas before casting.
  • Handle NULLs with COALESCE, and avoid division by zero with NULLIF.
-- Safe division example
SUM(amount) / NULLIF(COUNT(*), 0)           -- Avoids divide-by-zero

Worked examples

1) Clean currency strings and sum

Goal: Sum amounts stored as text like "$1,200.50".

SELECT
  SUM(CAST(REPLACE(REPLACE(amount_text, '$', ''), ',', '') AS DECIMAL(12,2))) AS total_amount
FROM sales_raw;

Why: Removing symbols/commas prevents cast errors and ensures exact decimal math.

2) Join when key types differ

Goal: events.user_id_text is VARCHAR, users.user_id is INT.

SELECT e.event_id, u.user_id
FROM events e
JOIN users u
  ON CAST(e.user_id_text AS INT) = u.user_id;
Performance tip

If users is large and indexed on user_id, avoid casting it. Cast the smaller side (events) or stage a cleaned key column first.

3) Prevent integer division and rounding surprises

-- Bad: integer division (returns 2)
SELECT 5/2 AS result;

-- Good: cast one operand to decimal (returns 2.50)
SELECT CAST(5 AS DECIMAL(10,2)) / 2 AS result;

4) Map text to boolean

SELECT
  CASE WHEN UPPER(active_text) IN ('TRUE','T','1','Y','YES') THEN TRUE
       WHEN UPPER(active_text) IN ('FALSE','F','0','N','NO') THEN FALSE
       ELSE NULL -- unknown
  END AS active
FROM accounts;

Exercises

Do these in order. Use the built-in Quick Test afterward to check your understanding.

Exercise 1 — Clean and sum currency strings

Table: sales_raw(id INT, amount_text VARCHAR).

Sample rows
id | amount_text
1  | $1,200.50
2  | $300
3  | 1,000

Task: Convert amount_text to DECIMAL and return the total as total_amount with 2 decimals.

  • Strip out $ and commas.
  • Cast to DECIMAL(12,2).
  • SUM the values.

Expected output: 2500.50

Exercise 2 — Join across mismatched types

Tables: users(user_id INT), events(user_id_text VARCHAR).

Sample rows
users
-----
101
102
103

events
------
user_id_text
'101'
'104'
'102'

Task: Count how many events match an existing user by casting user_id_text and joining. Return a single row: matched_event_count.

Expected output: 2

  • [ ] I ran both exercises and got the expected outputs.
  • [ ] My casts are explicit and placed where they won’t break indexes on big tables.
  • [ ] I handled symbols/commas and thought about NULL values.

Common mistakes and self-check

  • Mismatched join types causing slow scans or missed matches. Self-check: EXPLAIN your join; ensure only one side is cast if needed.
  • Using FLOAT for currency leading to tiny rounding errors. Self-check: switch to DECIMAL and compare totals.
  • Integer division when you expected decimals. Self-check: cast one operand to DECIMAL before dividing.
  • Locale/date formats ignored. Self-check: verify parsed dates by selecting MIN/MAX and a few samples.
  • String truncation when casting to CHAR(n). Self-check: ensure the target length fits the longest value.
  • Overflow (e.g., casting big numbers to SMALLINT). Self-check: validate ranges with MAX/LENGTH checks first.

Practical projects

  • Build a staging view that cleans a CSV import: normalize IDs, strip symbols, cast to proper types, and expose typed columns to analysts.
  • Create a revenue ledger table using DECIMAL with enforced scale; compare totals vs the same calculation using FLOAT to see differences.
  • Audit invalid rows: produce a report of inputs that fail casting (e.g., non-numeric prices), with counts and example values.

Learning path

  1. Identify column purposes and choose types: INT vs DECIMAL vs TEXT vs DATE/TIMESTAMP.
  2. Practice explicit casting and safe variants (TRY_/SAFE_). Add COALESCE/NULLIF guards.
  3. Master string cleaning: REPLACE/REGEXP_REPLACE before casting.
  4. Handle numeric precision: prefer DECIMAL for currency; cast before division/averages.
  5. Apply in joins and aggregations; test performance and correctness with EXPLAIN and sample checks.

Mini challenge

Given orders_raw(order_id, price_text, created_at_text) where price_text looks like "$2,345.60" and created_at_text like "2024-02-15 13:45:10":

  • Clean price_text to DECIMAL(12,2).
  • Cast created_at_text to TIMESTAMP.
  • Return monthly revenue (YYYY-MM) and total_price sorted ascending.
Hint

Use nested REPLACE on price_text, CAST to DECIMAL, CAST created_at_text to TIMESTAMP, then GROUP BY year-month extracted from the timestamp.

Next steps

  • Strengthen your joins with consistent typing.
  • Move to aggregations and window functions where numeric precision matters.
  • Practice date/time grouping and time zone handling in your SQL dialect.

Quick Test

Take the quick test below to check mastery. Available to everyone; sign in to save your progress.

Practice Exercises

2 exercises to complete

Instructions

Table: sales_raw(id INT, amount_text VARCHAR).

Convert amount_text like "$1,200.50" to DECIMAL and return the total as total_amount with 2 decimals.

  • Strip out $ and commas.
  • Cast to DECIMAL(12,2).
  • SUM the values.
Expected Output
2500.50

Data Types and Casting — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Data Types and Casting?

AI Assistant

Ask questions about this tool