Who this is for
Beginner to intermediate Computer Vision learners who want to understand how images are translated, rotated, scaled, sheared, and warped, and how to implement these safely and accurately.
Prerequisites
- Basic linear algebra: vectors, matrices, matrix multiplication
- Comfort with image coordinates (pixels, width/height)
- Optional: Python/NumPy familiarity for quick experiments
Why this matters in real work
- Data augmentation: rotate, scale, and shift images to improve model robustness.
- Image alignment and stitching: register images to a common frame before blending panoramas.
- Object tracking: predict where detections move frame-to-frame using transforms.
- Document scanning: rectify perspective to a clean top-down view.
Concept explained simply
An image is a grid of pixels with coordinates (x, y). A geometric transformation is a rule that maps each input coordinate (x, y) to a new output coordinate (x', y'). Common transforms:
- Translation: move the image by (tx, ty)
- Rotation: spin around a chosen center
- Scaling: resize uniformly or non-uniformly
- Shear: slant the image
- Affine: combines the above (straight lines remain straight, parallel lines stay parallel)
- Projective (perspective): handles foreshortening; straight lines stay straight, but parallel lines may meet
Mental model
Imagine you have a transparent sheet with your image drawn on it. You can slide it (translate), spin it (rotate), stretch/shrink it (scale), or tilt it (shear). For perspective, imagine viewing the sheet at an angle, like taking a photo from a corner. Matrices are just precise instructions for these moves.
Coordinate systems and matrices
- Image origin is usually top-left: x increases to the right, y increases downward.
- Affine transform (2x3 for images) maps [x, y, 1]^T to [x', y'] via a 3x3 with last row [0 0 1].
- Projective transform (homography, 3x3) maps homogeneous coordinates [x, y, 1]^T to [x', y', w']^T; then normalize: (x'/w', y'/w').
- Composition order matters. If using column vectors, “apply A then B” means multiply B · A.
Common 2D matrices
Translation T(tx, ty) = [[1, 0, tx],
[0, 1, ty]]
Rotation R(θ) about origin = [[ cosθ, -sinθ, 0],
[ sinθ, cosθ, 0]]
Scaling S(sx, sy) = [[sx, 0, 0],
[0, sy, 0]]
About a center (cx, cy): T(cx, cy) · R · T(-cx, -cy)Resampling: interpolation and borders
When mapping coordinates, many output pixels end up at fractional input locations. You must interpolate:
- Nearest neighbor: fast, blocky
- Bilinear: smooth, uses 4 neighbors
- Bicubic: smoother, heavier
Border handling when sampling outside the image:
- Constant: fill with a chosen value (e.g., 0)
- Replicate: clamp to edge pixels
- Reflect: mirror at the border
Why inverse mapping is preferred
To fill each output pixel, compute where it came from in the input (inverse mapping). This avoids gaps/overlaps that happen if you push pixels forward (forward mapping).
Worked examples
Example 1: Compose translation and rotation
Rotate 45° CCW around the origin, then translate by (tx, ty) = (20, -10).
R = [[ 0.7071, -0.7071, 0],
[ 0.7071, 0.7071, 0]]
T = [[1, 0, 20],
[0, 1,-10]]
Total = T · R = [[ 0.7071, -0.7071, 20],
[ 0.7071, 0.7071,-10]]Apply to point (x, y) = (10, 0):
x' = 0.7071*10 + (-0.7071)*0 + 20 = 27.071 y' = 0.7071*10 + 0.7071*0 - 10 = -2.929
Example 2: Bilinear interpolation value
At fractional coordinate (x, y) = (10.3, 5.6). Neighbors:
Top-left (10,5)=100, Top-right (11,5)=150, Bottom-left (10,6)=80, Bottom-right (11,6)=200
Fractions: ax = 0.3, ay = 0.6. Weights:
w00 = (1-ax)(1-ay)=0.7*0.4=0.28 w10 = ax(1-ay)=0.3*0.4=0.12 w01 = (1-ax)ay=0.7*0.6=0.42 w11 = ax*ay=0.3*0.6=0.18
Value = 0.28*100 + 0.12*150 + 0.42*80 + 0.18*200 = 115.6
Example 3: Simple perspective warp (homography)
H = [[1, 0, 0], [0, 1, 0], [0.001, 0.001, 1]]. Map point (200, 100).
[x', y', w']^T = H · [200, 100, 1]^T x'=200, y'=100, w' = 1 + 0.001*200 + 0.001*100 = 1.3 Normalize: (x/w, y/w) = (153.846, 76.923)
Exercises
Mirror of the exercises below. Work them out, then open the solutions to self-check.
- Exercise 1: Compose an affine matrix. Image size: width=200, height=100. Rotate 30° CCW about the image center (100, 50), then translate by (tx, ty)=(10, -5). Compute the final 2x3 matrix.
- Exercise 2: Bilinear weights at (10.3, 5.6). Compute w00, w10, w01, w11.
Show checklist before you check solutions
- [ ] I used inverse mapping logic for warping (conceptually) and forward mapping only for point demos
- [ ] I verified composition order (right-most applied first to column vectors)
- [ ] I normalized homogeneous coordinates by w before reading (x, y)
- [ ] My bilinear weights sum to 1
Common mistakes and self-check
- Wrong origin assumption: Using bottom-left instead of top-left. Self-check: Verify whether y increases downward in your pipeline.
- Composition order errors: R · T vs T · R. Self-check: Test on a simple point and see if translation happens before or after rotation.
- Forgetting to translate to/from the rotation center. Self-check: Rotate a point at the center; it should not move.
- Not normalizing homogeneous coordinates. Self-check: Divide by w' to get (x, y).
- Interpolation mismatch: Using nearest when bilinear was expected. Self-check: Look for blockiness vs smoothness.
- Border artifacts: Unexpected black edges. Self-check: Pick a border mode intentionally (constant/replicate/reflect).
Practical projects
- Document rectifier: Detect the four corners of a sheet and apply a homography to get a flat, top-down scan.
- Augmented overlay: Place a small image onto a planar surface (poster/desk) using a homography to match perspective.
- Augmentation pack: Implement rotate, translate, scale, and random perspective jitter for a training dataset with consistent interpolation and border modes.
Learning path
- Start: Affine transforms (translation, rotation about center, scaling)
- Next: Homogeneous coordinates and homographies
- Then: Interpolation strategies and border handling
- Finally: Image registration and robust estimation (RANSAC for homographies)
Mini challenge
Given a 256x256 image, build a transform that scales by 0.8 around the image center, rotates 20° CCW, and translates by (+15, -8). Compose the 2x3 matrix and test on points (0,0), (128,128), (255,255). Choose bilinear interpolation and reflect borders. Can you explain each visual artifact you see?
Next steps
- Automate a small function that takes (rotation, scale, center, translation) and returns a 2x3 affine matrix.
- Compare visual results with nearest vs bilinear vs bicubic on the same transform.
- Move on to robust alignment: estimate transforms from point matches and validate with residual errors.
Quick Test
Available to everyone for free. Log in to save your progress.