How to learn Geometric Transformations Basics for Computer Vision Foundations in Computer Vision Engineer for free

Who this is for

Beginner to intermediate Computer Vision learners who want to understand how images are translated, rotated, scaled, sheared, and warped, and how to implement these safely and accurately.

Prerequisites

Basic linear algebra: vectors, matrices, matrix multiplication
Comfort with image coordinates (pixels, width/height)
Optional: Python/NumPy familiarity for quick experiments

Why this matters in real work

Data augmentation: rotate, scale, and shift images to improve model robustness.
Image alignment and stitching: register images to a common frame before blending panoramas.
Object tracking: predict where detections move frame-to-frame using transforms.
Document scanning: rectify perspective to a clean top-down view.

Concept explained simply

An image is a grid of pixels with coordinates (x, y). A geometric transformation is a rule that maps each input coordinate (x, y) to a new output coordinate (x', y'). Common transforms:

Translation: move the image by (tx, ty)
Rotation: spin around a chosen center
Scaling: resize uniformly or non-uniformly
Shear: slant the image
Affine: combines the above (straight lines remain straight, parallel lines stay parallel)
Projective (perspective): handles foreshortening; straight lines stay straight, but parallel lines may meet

Mental model

Imagine you have a transparent sheet with your image drawn on it. You can slide it (translate), spin it (rotate), stretch/shrink it (scale), or tilt it (shear). For perspective, imagine viewing the sheet at an angle, like taking a photo from a corner. Matrices are just precise instructions for these moves.

Coordinate systems and matrices

Image origin is usually top-left: x increases to the right, y increases downward.
Affine transform (2x3 for images) maps [x, y, 1]^T to [x', y'] via a 3x3 with last row [0 0 1].
Projective transform (homography, 3x3) maps homogeneous coordinates [x, y, 1]^T to [x', y', w']^T; then normalize: (x'/w', y'/w').
Composition order matters. If using column vectors, “apply A then B” means multiply B · A.

Common 2D matrices

Translation T(tx, ty) = [[1, 0, tx],
                         [0, 1, ty]]

Rotation R(θ) about origin = [[ cosθ, -sinθ, 0],
                              [ sinθ,  cosθ, 0]]

Scaling S(sx, sy) = [[sx, 0, 0],
                     [0, sy, 0]]

About a center (cx, cy):  T(cx, cy) · R · T(-cx, -cy)

Resampling: interpolation and borders

When mapping coordinates, many output pixels end up at fractional input locations. You must interpolate:

Nearest neighbor: fast, blocky
Bilinear: smooth, uses 4 neighbors
Bicubic: smoother, heavier

Border handling when sampling outside the image:

Constant: fill with a chosen value (e.g., 0)
Replicate: clamp to edge pixels
Reflect: mirror at the border

Why inverse mapping is preferred

To fill each output pixel, compute where it came from in the input (inverse mapping). This avoids gaps/overlaps that happen if you push pixels forward (forward mapping).

Worked examples

Example 1: Compose translation and rotation

Rotate 45° CCW around the origin, then translate by (tx, ty) = (20, -10).

R = [[ 0.7071, -0.7071, 0],
     [ 0.7071,  0.7071, 0]]
T = [[1, 0, 20],
     [0, 1,-10]]
Total = T · R = [[ 0.7071, -0.7071, 20],
                 [ 0.7071,  0.7071,-10]]

Apply to point (x, y) = (10, 0):

x' = 0.7071*10 + (-0.7071)*0 + 20 = 27.071
y' = 0.7071*10 +  0.7071*0 - 10 = -2.929

Example 2: Bilinear interpolation value

At fractional coordinate (x, y) = (10.3, 5.6). Neighbors:

Top-left (10,5)=100, Top-right (11,5)=150,
Bottom-left (10,6)=80, Bottom-right (11,6)=200

Fractions: ax = 0.3, ay = 0.6. Weights:

w00 = (1-ax)(1-ay)=0.7*0.4=0.28
w10 = ax(1-ay)=0.3*0.4=0.12
w01 = (1-ax)ay=0.7*0.6=0.42
w11 = ax*ay=0.3*0.6=0.18

Value = 0.28*100 + 0.12*150 + 0.42*80 + 0.18*200 = 115.6

Example 3: Simple perspective warp (homography)

H = [[1, 0, 0], [0, 1, 0], [0.001, 0.001, 1]]. Map point (200, 100).

[x', y', w']^T = H · [200, 100, 1]^T
x'=200, y'=100, w' = 1 + 0.001*200 + 0.001*100 = 1.3
Normalize: (x/w, y/w) = (153.846, 76.923)

Exercises

Mirror of the exercises below. Work them out, then open the solutions to self-check.

Exercise 1: Compose an affine matrix. Image size: width=200, height=100. Rotate 30° CCW about the image center (100, 50), then translate by (tx, ty)=(10, -5). Compute the final 2x3 matrix.
Exercise 2: Bilinear weights at (10.3, 5.6). Compute w00, w10, w01, w11.

Show checklist before you check solutions

[ ] I used inverse mapping logic for warping (conceptually) and forward mapping only for point demos
[ ] I verified composition order (right-most applied first to column vectors)
[ ] I normalized homogeneous coordinates by w before reading (x, y)
[ ] My bilinear weights sum to 1

Common mistakes and self-check

Wrong origin assumption: Using bottom-left instead of top-left. Self-check: Verify whether y increases downward in your pipeline.
Composition order errors: R · T vs T · R. Self-check: Test on a simple point and see if translation happens before or after rotation.
Forgetting to translate to/from the rotation center. Self-check: Rotate a point at the center; it should not move.
Not normalizing homogeneous coordinates. Self-check: Divide by w' to get (x, y).
Interpolation mismatch: Using nearest when bilinear was expected. Self-check: Look for blockiness vs smoothness.
Border artifacts: Unexpected black edges. Self-check: Pick a border mode intentionally (constant/replicate/reflect).

Practical projects

Document rectifier: Detect the four corners of a sheet and apply a homography to get a flat, top-down scan.
Augmented overlay: Place a small image onto a planar surface (poster/desk) using a homography to match perspective.
Augmentation pack: Implement rotate, translate, scale, and random perspective jitter for a training dataset with consistent interpolation and border modes.

Learning path

Start: Affine transforms (translation, rotation about center, scaling)
Next: Homogeneous coordinates and homographies
Then: Interpolation strategies and border handling
Finally: Image registration and robust estimation (RANSAC for homographies)

Mini challenge

Given a 256x256 image, build a transform that scales by 0.8 around the image center, rotates 20° CCW, and translates by (+15, -8). Compose the 2x3 matrix and test on points (0,0), (128,128), (255,255). Choose bilinear interpolation and reflect borders. Can you explain each visual artifact you see?

Next steps

Automate a small function that takes (rotation, scale, center, translation) and returns a 2x3 affine matrix.
Compare visual results with nearest vs bilinear vs bicubic on the same transform.
Move on to robust alignment: estimate transforms from point matches and validate with residual errors.

Quick Test

Available to everyone for free. Log in to save your progress.

Menu

Geometric Transformations Basics

Table of Contents

Who this is for

Prerequisites

Why this matters in real work

Concept explained simply

Coordinate systems and matrices

Resampling: interpolation and borders

Worked examples

Example 1: Compose translation and rotation

Example 2: Bilinear interpolation value

Example 3: Simple perspective warp (homography)

Exercises

Common mistakes and self-check

Practical projects

Learning path

Mini challenge

Next steps

Quick Test

Practice Exercises

Compose an affine matrix: rotate about center then translate

Instructions

Expected Output

Bilinear interpolation weights at a fractional coordinate

Geometric Transformations Basics — Quick Test

Have questions about Geometric Transformations Basics?

AI Assistant