luvv to helpDiscover the Best Free Online Tools
Topic 9 of 12

Data Consistency Checks

Learn Data Consistency Checks for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 19, 2025 | Updated: December 19, 2025

Why this matters

As a Data Analyst, your insights are only as trustworthy as your data. Consistency checks ensure values agree across rows, columns, and systems. You will use them to:

  • Verify IDs align across tables (orders match customers).
  • Standardize formats (dates, units, categories) so joins and aggregations work reliably.
  • Catch impossible or contradictory facts (ship_date before order_date).
  • Prevent duplicate counting in dashboards and experiments.

Concept explained simply

Data consistency means the same thing is represented the same way, everywhere it appears. Think of it as making sure all players follow the same rulebook.

What consistency is not

It is not the same as accuracy or completeness. A value can be consistently wrong across records. Consistency helps you trust joins, grouping, and trend lines, but you still need validation against reality for accuracy.

Mental model: Contracts and guards

Each column has a contract (data type, allowed values, format). Each row has guards (cross-field rules like order_date ≤ ship_date). Each join has references (foreign keys must exist). Checks enforce contracts, guards, and references.

Core consistency checks you’ll use often

Worked examples

Example 1: Email and date consistency

Input:
name,email,signup_date
Ana, ANA.SMITH@example.com ,2024/03/12
  • Normalize email: trim + lowercase → ana.smith@example.com
  • Normalize date to ISO (YYYY-MM-DD): 2024-03-12
Output:
name,email,signup_date
Ana,ana.smith@example.com,2024-03-12

Example 2: Unit standardization

Input:
weight,unit
150,lb
  • Convert lb → kg (Ă— 0.453592), keep 1 decimal place for consistency
Output:
weight_kg
68.0
Why rounding matters

Choose a precision policy (e.g., 1 decimal) and apply it everywhere so rollups and equality checks behave predictably.

Example 3: Cross-field guard

Input:
order_id,order_date,ship_date
101,2024-05-02,2024-05-01
  • Check order_date ≤ ship_date → fails. Flag or swap dates only if policy allows corrections; otherwise quarantine row.

Example 4: Referential integrity

orders.customer_id must exist in customers.id

Left-join orders to customers and list orders with missing customer matches. These are inconsistent references and must be resolved.

Hands-on exercises

These mirror the exercises below. Complete them here, then compare with the provided solutions. The Quick Test is available to everyone; log in if you want your progress saved.

Exercise 1: Canonicalize a customer extract (CSV)

Goal: Standardize formats, convert units, and flag inconsistencies.

Dataset
id,name,email,signup_date,country,weight,weight_unit,order_date,ship_date
1,Ana Smith, ANA.SMITH@example.com ,2024/03/12,US,150,lb,2024-03-14,2024-03-13
2,John Doe,john.doe@example,12-04-2024,United States,68,kg,2024-04-12,2024-04-20
3,Maria Chen,maria.chen@example.com,2024-05-02,CA,72,KG,2024-05-06,2024-05-06
4,John Doe,john.doe@example.com,2024-04-12,US,150,lb,2024-04-12,2024-04-13
  1. Trim/lowercase emails; mark invalid formats.
  2. Normalize dates to YYYY-MM-DD.
  3. Normalize country to ISO2 (US, CA).
  4. Convert all weights to kg with one decimal in a new column weight_kg.
  5. Flag rows where order_date > ship_date.
Peek expected output (high level)

All dates in ISO, countries as US/CA, weight_kg present, id=1 flagged for ship before order, id=2 flagged invalid email.

Exercise 2: Write SQL checks for an orders system

Schema & sample
customers(id,email,country)
(1,'ana.smith@example.com','US'),
(2,'john.doe@example.com','US')

orders(id,customer_id,external_order_id,order_date,ship_date,total,weight,weight_unit)
(101,1,'A-001','2024-04-01','2024-04-03',120.00,150,'lb'),
(102,2,'A-002','2024-04-05','2024-04-04', 80.00,68,'kg'),
(103,3,'A-003','2024-04-07','2024-04-08', 25.00, 1,'stone'),
(104,2,'A-002','2024-04-05','2024-04-06', 80.00,68,'kg')
  1. Orphan orders: orders with customer_id not in customers.
  2. Invalid ship sequence: ship_date < order_date.
  3. Invalid units: weight_unit not in ('kg','lb').
  4. Duplicate external_order_id.
Tip

Use NOT EXISTS for orphans, simple WHERE for sequences/units, and GROUP BY HAVING COUNT(*)>1 for duplicates.

Common mistakes and self-checks

  • Mixing formats: Some rows still have dd-mm-yyyy while others are ISO. Self-check: count distinct date patterns; expect 1.
  • Converting units but not rounding: Causes join/key mismatch later. Self-check: verify numeric scale and precision.
  • Fixing referential issues by deleting data: Prefer staging/quarantine plus remediation, not silent deletion.
  • Over-aggressive deduplication: Merging near-duplicates without a reversible key. Self-check: preserve original ids and a dedupe reason.
  • Silent category drift: New categories slip in (e.g., "U.S."). Self-check: compare distinct categories against reference list daily/weekly.
Quick self-audit checklist
  • One date format across all date columns
  • All units standardized and precision policy applied
  • No orphan foreign keys
  • Business keys have no duplicates
  • All normalization rules documented

Practical projects

  • Standardize a multi-country customer dataset: dates, phone formats, and country codes; document the rules.
  • Build a "consistency report" SQL view listing violations per rule with counts and sample rows.
  • Create a reproducible deduplication pipeline that tags merges and keeps a log of decisions.

Who this is for

Aspiring and practicing Data Analysts who prepare data for dashboards, analyses, and experiments.

Prerequisites

  • Comfort with basic SQL (SELECT, JOIN, GROUP BY)
  • Familiarity with CSVs and tabular data
  • Optional: pandas or spreadsheet skills

Learning path

  1. Master type and format normalization (dates, numbers, text)
  2. Apply cross-field and referential checks
  3. Standardize units and categories
  4. Implement deduplication rules
  5. Automate reporting of violations

Next steps

  • Do the exercises below and compare to solutions.
  • Take the Quick Test to confirm understanding. Everyone can take it; log in to save progress.
  • Apply these checks to a real dataset at work or in a portfolio project.

Mini challenge

You receive product weights in g, kg, and lb, and shipment dates in dd/mm/yyyy and yyyy-mm-dd. In 15 minutes, outline the exact rules you will enforce, the conversion factors you will use, the final formats, and how you will flag violations—then implement on 10 sample rows.

Practice Exercises

2 exercises to complete

Instructions

Standardize formats, convert units, and flag inconsistencies.

  1. Trim and lowercase emails; mark invalid emails as invalid_email=true.
  2. Normalize dates (signup_date, order_date, ship_date) to YYYY-MM-DD.
  3. Normalize country to ISO2 (United States → US).
  4. Create weight_kg: if weight_unit=lb, multiply by 0.453592; if kg or KG, keep value. Round to 1 decimal.
  5. Create flag date_inconsistent=true where order_date > ship_date.
Dataset
id,name,email,signup_date,country,weight,weight_unit,order_date,ship_date
1,Ana Smith, ANA.SMITH@example.com ,2024/03/12,US,150,lb,2024-03-14,2024-03-13
2,John Doe,john.doe@example,12-04-2024,United States,68,kg,2024-04-12,2024-04-20
3,Maria Chen,maria.chen@example.com,2024-05-02,CA,72,KG,2024-05-06,2024-05-06
4,John Doe,john.doe@example.com,2024-04-12,US,150,lb,2024-04-12,2024-04-13
Expected Output
All dates in ISO format; countries as US/CA; added columns: weight_kg (1 decimal), invalid_email (bool), date_inconsistent (bool). Row 1 date_inconsistent=true; Row 2 invalid_email=true; weights converted for lb rows to ~68.0 kg.

Data Consistency Checks — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Data Consistency Checks?

AI Assistant

Ask questions about this tool