luvv to helpDiscover the Best Free Online Tools
Topic 7 of 10

Train Validation Test Splits

Learn Train Validation Test Splits for free with explanations, exercises, and a quick test (for Data Scientist).

Published: January 1, 2026 | Updated: January 1, 2026

Why this matters

  • Train set: used to fit model parameters.
  • Validation set: used to tune choices (hyperparameters, thresholds, features). You may reuse it during development.
  • Test set: used once at the end to report final performance. Keep it fully untouched during model development.

Choosing a split strategy

Pick the strategy that matches your data and goal:

  • Classic holdout (i.i.d. data, enough samples): 70/15/15 or 60/20/20 train/val/test.
  • Imbalanced classification: use stratified splits so class ratios are similar across sets.
  • Grouped data (e.g., multiple rows per user/patient): use group-aware splitting to keep all samples from a group in the same set.
  • Time series: split by time. Validation/test must be later than train. Use rolling/expanding window CV if possible.
  • Small datasets: prefer k-fold cross-validation (e.g., 5-fold) for model selection; keep a small final test if feasible.
  • Hyperparameter tuning: use cross-validation on the train set or a dedicated validation set; for rigorous estimates, use nested cross-validation.
  • Feature preprocessing: fit scalers/encoders only on train; apply to val/test to avoid leakage.
  • Multiple models: compare on the same validation protocol; only evaluate the final choice on test.
Rules of thumb
  • Keep test set sacred — evaluate once at the end.
  • If data is skewed or rare events exist, stratify or use grouped splits.
  • For time series, never shuffle across time.
  • When in doubt, simulate your real-world scenario in your split.

Worked examples

Example 1 — Balanced tabular classification (i.i.d.)

Data: 50,000 rows, 10 numeric features, balanced classes.

  • Split: 70% train, 15% validation, 15% test (stratified = optional).
  • Flow: Fit model on train; tune hyperparameters on validation; once finalized, retrain on train+validation; evaluate once on test.
  • Why: Plenty of data and i.i.d. makes holdout reliable.
Example 2 — Imbalanced fraud detection

Data: 200,000 transactions, 0.5% fraud.

  • Split: Stratified 70/15/15 to keep positive rate stable.
  • Training: Use class weights or focal loss; threshold tuning on validation to optimize F1 or recall at fixed precision.
  • Sanity checks: No user overlap leakage; no target leakage via post-transaction features.
Example 3 — Time series demand forecasting

Data: Daily sales 2019-01-01 to 2024-06-30.

  • Split by time: Train = 2019-01-01 to 2023-12-31; Validation = 2024-01-01 to 2024-03-31; Test = 2024-04-01 to 2024-06-30.
  • Rolling CV: Multiple validation windows (e.g., Jan, Feb, Mar 2024) to choose hyperparameters robustly.
  • Leakage guard: Only use features available up to prediction time (lagged features, moving averages fitted on past only).
Example 4 — Grouped medical data

Data: 12,000 records from 2,000 patients, multiple visits each.

  • Split: Group-aware by patient IDs so each patient appears in exactly one set.
  • Ratios: 70/15/15 by patients, not by rows.
  • Reason: Prevent the model from seeing the same patient in train and validation, which would inflate metrics.

Step-by-step: implement your split

  1. Define evaluation goal. What metric and deployment scenario? Classification threshold? Forecast horizon?
  2. Identify data structure. i.i.d., time-ordered, groups, imbalance.
  3. Choose split method. Holdout, stratified, group, time-based, k-fold, or nested CV.
  4. Prevent leakage. Fit preprocessing on train only. Keep temporal order. Isolate groups.
  5. Tune and select. Use validation (or CV) to choose hyperparameters and thresholds.
  6. Finalize. Retrain on train+validation with chosen settings; evaluate once on test; record metrics and confidence intervals where possible.
Mini checklist before you evaluate
  • Test set untouched until final evaluation
  • Preprocessing fitted only on train
  • Correct split type for data structure (i.i.d./time/group)
  • Metrics match business goal
  • Threshold/feature selection decided using validation only

Exercises

Do these to lock in understanding. The same tasks are listed below with solutions you can reveal.

Exercise 1: Design a split plan for imbalanced credit defaults (ID: ex1)

Goal: Propose ratios, split type, tuning approach, and leakage checks.

  • Data: 500,000 customers, 2% default rate, multiple loans per customer.
  • Target metric: Recall at 10% false positive rate.
Hints
  • Imbalance + multiple loans per customer → stratification and grouping.
  • Threshold tuning belongs on validation.

Exercise 2: Time-based split for energy forecasting (ID: ex2)

Goal: Create a rolling validation plan for a 7-day ahead forecast.

  • Data: Hourly energy usage, 2022-01-01 to 2024-12-31.
  • Target metric: MAPE on the next 7 days.
Hints
  • Use expanding or sliding windows.
  • Validation windows should simulate the 7-day horizon.
Exercises completion checklist
  • You specified split ratios and method
  • You named the evaluation metric and where to optimize it
  • You listed leakage risks and controls
  • You explained how you will finalize and test once

Common mistakes and self-check

  • Peeking at test set. Self-check: Did you change anything after viewing test metrics? If yes, you invalidated the test.
  • Random split for time series. Self-check: Do any training rows occur after validation rows in time? If yes, fix with a time-based split.
  • Ignoring groups. Self-check: Can the same user/patient appear in both train and validation? If yes, use group-aware splitting.
  • Leaky preprocessing. Self-check: Are scalers/imputers fitted on full data? Refit them on train only.
  • Using accuracy on imbalanced data. Self-check: Does a dummy model score high? Choose metrics like ROC-AUC, PR-AUC, recall/precision at target FPR.

Practical projects

  • Customer churn prediction: Group-aware stratified split by customer ID; compare threshold policies at fixed churn budget.
  • Retail sales forecasting: Rolling window validation across seasons; choose horizon-specific features.
  • Click-through prediction: Session-level grouping; evaluate PR-AUC and calibration drift between validation and test.

Mini challenge

You have 80,000 emails with 1% spam. Each sender may appear multiple times. Build a split plan and list three leakage checks you will perform. Keep the test absolutely untouched. Write your plan in 5 bullet points.

Who this is for

  • Beginners who know basic supervised learning and want trustworthy evaluation.
  • Practitioners switching to time series or grouped datasets.
  • Anyone preparing for ML interviews or productionizing models.

Prerequisites

  • Basic ML concepts: train vs. test, overfitting, common metrics.
  • Comfort with data preprocessing (scaling, encoding, imputation).
  • Awareness of your business metric (what matters in production).

Learning path

  1. Review evaluation metrics suited to your problem.
  2. Learn split strategies: holdout, stratified, group, time-based.
  3. Practice with k-fold and nested CV for tuning.
  4. Apply to a real dataset; document decisions and leakage checks.
  5. Finalize: retrain on train+val; evaluate once on test and report.

Next steps

  • Implement the split strategy in your current project and log every decision.
  • Run at least one alternative split (e.g., different validation windows) to test robustness.
  • Proceed to the quick test below to confirm understanding. Everyone can take it; log in to save progress.

Quick Test

Take the short quiz now. Everyone can access it for free; sign in to save your progress.

Practice Exercises

2 exercises to complete

Instructions

Data: 500,000 customers, 2% default rate, multiple loans per customer (customer_id). Target: default within 90 days. Metric: Recall at 10% FPR.

  • Propose split ratios and method.
  • Describe tuning process (including threshold selection).
  • List at least 3 leakage risks and how you will prevent them.
  • Explain final training and single test evaluation.
Expected Output
A concise plan covering stratified + group-based splitting, threshold tuning on validation, leakage controls, and final one-time test evaluation.

Train Validation Test Splits — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Train Validation Test Splits?

AI Assistant

Ask questions about this tool