luvv to helpDiscover the Best Free Online Tools
Topic 2 of 8

Geometric Transformations Basics

Learn Geometric Transformations Basics for free with explanations, exercises, and a quick test (for Computer Vision Engineer).

Published: January 5, 2026 | Updated: January 5, 2026

Who this is for

Beginner to intermediate Computer Vision learners who want to understand how images are translated, rotated, scaled, sheared, and warped, and how to implement these safely and accurately.

Prerequisites

  • Basic linear algebra: vectors, matrices, matrix multiplication
  • Comfort with image coordinates (pixels, width/height)
  • Optional: Python/NumPy familiarity for quick experiments

Why this matters in real work

  • Data augmentation: rotate, scale, and shift images to improve model robustness.
  • Image alignment and stitching: register images to a common frame before blending panoramas.
  • Object tracking: predict where detections move frame-to-frame using transforms.
  • Document scanning: rectify perspective to a clean top-down view.

Concept explained simply

An image is a grid of pixels with coordinates (x, y). A geometric transformation is a rule that maps each input coordinate (x, y) to a new output coordinate (x', y'). Common transforms:

  • Translation: move the image by (tx, ty)
  • Rotation: spin around a chosen center
  • Scaling: resize uniformly or non-uniformly
  • Shear: slant the image
  • Affine: combines the above (straight lines remain straight, parallel lines stay parallel)
  • Projective (perspective): handles foreshortening; straight lines stay straight, but parallel lines may meet
Mental model

Imagine you have a transparent sheet with your image drawn on it. You can slide it (translate), spin it (rotate), stretch/shrink it (scale), or tilt it (shear). For perspective, imagine viewing the sheet at an angle, like taking a photo from a corner. Matrices are just precise instructions for these moves.

Coordinate systems and matrices

  • Image origin is usually top-left: x increases to the right, y increases downward.
  • Affine transform (2x3 for images) maps [x, y, 1]^T to [x', y'] via a 3x3 with last row [0 0 1].
  • Projective transform (homography, 3x3) maps homogeneous coordinates [x, y, 1]^T to [x', y', w']^T; then normalize: (x'/w', y'/w').
  • Composition order matters. If using column vectors, “apply A then B” means multiply B · A.
Common 2D matrices
Translation T(tx, ty) = [[1, 0, tx],
                         [0, 1, ty]]

Rotation R(θ) about origin = [[ cosθ, -sinθ, 0],
                              [ sinθ,  cosθ, 0]]

Scaling S(sx, sy) = [[sx, 0, 0],
                     [0, sy, 0]]

About a center (cx, cy):  T(cx, cy) · R · T(-cx, -cy)

Resampling: interpolation and borders

When mapping coordinates, many output pixels end up at fractional input locations. You must interpolate:

  • Nearest neighbor: fast, blocky
  • Bilinear: smooth, uses 4 neighbors
  • Bicubic: smoother, heavier

Border handling when sampling outside the image:

  • Constant: fill with a chosen value (e.g., 0)
  • Replicate: clamp to edge pixels
  • Reflect: mirror at the border
Why inverse mapping is preferred

To fill each output pixel, compute where it came from in the input (inverse mapping). This avoids gaps/overlaps that happen if you push pixels forward (forward mapping).

Worked examples

Example 1: Compose translation and rotation

Rotate 45° CCW around the origin, then translate by (tx, ty) = (20, -10).

R = [[ 0.7071, -0.7071, 0],
     [ 0.7071,  0.7071, 0]]
T = [[1, 0, 20],
     [0, 1,-10]]
Total = T · R = [[ 0.7071, -0.7071, 20],
                 [ 0.7071,  0.7071,-10]]

Apply to point (x, y) = (10, 0):

x' = 0.7071*10 + (-0.7071)*0 + 20 = 27.071
y' = 0.7071*10 +  0.7071*0 - 10 = -2.929

Example 2: Bilinear interpolation value

At fractional coordinate (x, y) = (10.3, 5.6). Neighbors:

Top-left (10,5)=100, Top-right (11,5)=150,
Bottom-left (10,6)=80, Bottom-right (11,6)=200

Fractions: ax = 0.3, ay = 0.6. Weights:

w00 = (1-ax)(1-ay)=0.7*0.4=0.28
w10 = ax(1-ay)=0.3*0.4=0.12
w01 = (1-ax)ay=0.7*0.6=0.42
w11 = ax*ay=0.3*0.6=0.18

Value = 0.28*100 + 0.12*150 + 0.42*80 + 0.18*200 = 115.6

Example 3: Simple perspective warp (homography)

H = [[1, 0, 0], [0, 1, 0], [0.001, 0.001, 1]]. Map point (200, 100).

[x', y', w']^T = H · [200, 100, 1]^T
x'=200, y'=100, w' = 1 + 0.001*200 + 0.001*100 = 1.3
Normalize: (x/w, y/w) = (153.846, 76.923)

Exercises

Mirror of the exercises below. Work them out, then open the solutions to self-check.

  1. Exercise 1: Compose an affine matrix. Image size: width=200, height=100. Rotate 30° CCW about the image center (100, 50), then translate by (tx, ty)=(10, -5). Compute the final 2x3 matrix.
  2. Exercise 2: Bilinear weights at (10.3, 5.6). Compute w00, w10, w01, w11.
Show checklist before you check solutions
  • [ ] I used inverse mapping logic for warping (conceptually) and forward mapping only for point demos
  • [ ] I verified composition order (right-most applied first to column vectors)
  • [ ] I normalized homogeneous coordinates by w before reading (x, y)
  • [ ] My bilinear weights sum to 1

Common mistakes and self-check

  • Wrong origin assumption: Using bottom-left instead of top-left. Self-check: Verify whether y increases downward in your pipeline.
  • Composition order errors: R · T vs T · R. Self-check: Test on a simple point and see if translation happens before or after rotation.
  • Forgetting to translate to/from the rotation center. Self-check: Rotate a point at the center; it should not move.
  • Not normalizing homogeneous coordinates. Self-check: Divide by w' to get (x, y).
  • Interpolation mismatch: Using nearest when bilinear was expected. Self-check: Look for blockiness vs smoothness.
  • Border artifacts: Unexpected black edges. Self-check: Pick a border mode intentionally (constant/replicate/reflect).

Practical projects

  • Document rectifier: Detect the four corners of a sheet and apply a homography to get a flat, top-down scan.
  • Augmented overlay: Place a small image onto a planar surface (poster/desk) using a homography to match perspective.
  • Augmentation pack: Implement rotate, translate, scale, and random perspective jitter for a training dataset with consistent interpolation and border modes.

Learning path

  • Start: Affine transforms (translation, rotation about center, scaling)
  • Next: Homogeneous coordinates and homographies
  • Then: Interpolation strategies and border handling
  • Finally: Image registration and robust estimation (RANSAC for homographies)

Mini challenge

Given a 256x256 image, build a transform that scales by 0.8 around the image center, rotates 20° CCW, and translates by (+15, -8). Compose the 2x3 matrix and test on points (0,0), (128,128), (255,255). Choose bilinear interpolation and reflect borders. Can you explain each visual artifact you see?

Next steps

  • Automate a small function that takes (rotation, scale, center, translation) and returns a 2x3 affine matrix.
  • Compare visual results with nearest vs bilinear vs bicubic on the same transform.
  • Move on to robust alignment: estimate transforms from point matches and validate with residual errors.

Quick Test

Available to everyone for free. Log in to save your progress.

Practice Exercises

2 exercises to complete

Instructions

Image size: width=200, height=100. Rotate 30° CCW about the image center (100, 50), then translate by (tx, ty)=(10, -5). Compute the final 2x3 affine matrix (for warpAffine-style use). Use cos30°=0.866025, sin30°=0.5. Round to 6 decimals.

Expected Output
[[0.866025, -0.500000, 48.397460], [0.500000, 0.866025, -48.301270]]

Geometric Transformations Basics — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Geometric Transformations Basics?

AI Assistant

Ask questions about this tool