How to learn Performance Optimization And Memory for Python (pandas and numpy) in Data Scientist for free

Why this matters

As a Data Scientist, you frequently work with millions of rows. Inefficient pandas or NumPy code can turn a 10-second task into a 10-minute blocker, or worse, crash your notebook due to memory. Mastering performance and memory lets you:

Clean and aggregate large datasets without timeouts.
Feature engineer quickly for iterative modeling.
Run experiments on a laptop instead of needing a cluster.
Ship robust data pipelines that don’t silently duplicate data in memory.

Concept explained simply

Core idea: move work from slow Python loops to fast vectorized operations in pandas/NumPy, and store data in the smallest safe dtypes.

Mental model

Think of Python loops as you moving bricks one by one. Vectorized operations are a forklift: the same job but hundreds of bricks at once. Memory optimization is choosing the right-sized boxes for those bricks. Too big wastes space; too small breaks. Aim for the smallest box that fits (correct dtype), and use the forklift (vectorized ops).

Key techniques you will use

Vectorize instead of loops: prefer column-wise arithmetic, .str, .dt, and NumPy ufuncs.
Choose compact dtypes: use int8/16/32, float32 where precision allows; convert low-cardinality object columns to category.
Avoid unnecessary copies: use .loc for assignment; avoid chained indexing; use .to_numpy(copy=False) when safe.
Measure memory: df.info(memory_usage='deep'), df.memory_usage(deep=True).sum().
Lean groupby: prefer built-in aggregations (sum, mean, size, nunique) and .agg with native funcs; avoid row-wise .apply.
Chunk large files: pd.read_csv(..., chunksize=...), process and reduce before concatenating.
Efficient joins: ensure join keys are the same dtype; consider categoricals for repeated keys; merge only required columns.
String ops: use vectorized .str methods; avoid Python loops over rows.
Sparse/zero-heavy arrays: consider pandas' Sparse dtype or NumPy sparse alternatives when mostly zeros.
Sorting and indexing: sort_index/sort_values only when needed; set an index if repeatedly selecting by that key.

In-place vs copy in pandas

inplace=True does not guarantee faster or lower memory operations in modern pandas; it often creates a new object under the hood. Prefer assignment to a new variable and measure.

A reliable optimization workflow

Measure first: time your current code (small dry-run) and check memory with df.info(memory_usage='deep').
Eliminate Python loops: replace with vectorized NumPy/pandas operations.
Fix dtypes: downcast numerics; convert object to category where appropriate.
Optimize I/O: read only needed columns (usecols), parse dates selectively, use chunksize for large files.
Re-measure and validate correctness on a sample, then on full data.

Worked examples

Example 1 — Replace loops with vectorization

Show example

Goal: standardize a column and compute a ratio without loops.

import pandas as pd
import numpy as np

df = pd.DataFrame({'x':[1,2,3,4,5],'y':[10,20,30,40,50]})

# Slow pattern (avoid)
# z = []
# for v in df['y']:
#     z.append((v - df['y'].mean()) / df['y'].std(ddof=0))
# df['z'] = z
# df['ratio'] = [a/b for a,b in zip(df['y'], df['x'])]

# Vectorized
y = df['y'].to_numpy()
df['z'] = (y - y.mean()) / y.std(ddof=0)
df['ratio'] = df['y'] / df['x']
print(df)

Example 2 — Reduce memory via dtypes

Show example

df = pd.DataFrame({
    'user_id':[101,101,102,103,103,103],
    'city':['NY','NY','LA','LA','LA','SF'],
    'amount':[12.5, 9.0, 3.0, 1.0, 0.0, 5.5],
    'flag':[0,1,0,0,1,0]
})
print('Before bytes:', df.memory_usage(deep=True).sum())

# Optimize
opt = df.copy()
opt['user_id'] = opt['user_id'].astype('int16')
opt['city'] = opt['city'].astype('category')
opt['amount'] = opt['amount'].astype('float32')
opt['flag'] = opt['flag'].astype('int8')  # or bool

print('After bytes:', opt.memory_usage(deep=True).sum())
print(opt.dtypes)

Expected: noticeable drop in bytes; dtypes are int16, category, float32, int8/bool.

Example 3 — Faster groupby aggregations

Show example

sales = pd.DataFrame({
    'city':['NY','NY','LA','LA','SF','SF','SF'],
    'amount':[12.5, 9.0, 3.0, 1.0, 8.0, 2.0, 5.0]
})

# Slow-ish: apply with custom row-wise lambda (avoid)
# res = sales.groupby('city').apply(lambda g: pd.Series({
#     'sum': g['amount'].sum(),
#     'avg': g['amount'].mean(),
#     'n': g['amount'].size
# }))

# Fast: built-ins
res = sales.groupby('city')['amount'].agg(sum='sum', avg='mean', n='size')
print(res)

Hands-on exercises

Try these in your environment. They mirror the exercises below so you can check solutions.

Exercise 1 — Vectorize and standardize

Create a DataFrame df = pd.DataFrame({'x':[1,2,3,4,5],'y':[10,20,30,40,50]}). Add:

z: standardized y using population std (ddof=0)
ratio: y divided by x
Use no Python loops or row-wise apply.

Hint

Use to_numpy() or operate directly on the Series. For z: (y - y.mean()) / y.std(ddof=0).

Exercise 2 — Reduce memory safely

With df = pd.DataFrame({'user_id':[101,101,102,103,103,103],'city':['NY','NY','LA','LA','LA','SF'],'amount':[12.5,9.0,3.0,1.0,0.0,5.5],'flag':[0,1,0,0,1,0]}):

Downcast user_id to smallest safe integer.
Convert city to category.
Downcast amount to float32; flag to int8 or bool.
Verify dtypes and compare memory before/after with memory_usage(deep=True).

Hint

Use astype for explicit casts; print dtypes before and after. Compare byte counts.

Checklist before you move on

I replaced loops with vectorized operations.
I measured memory usage with deep=True.
I used the smallest safe dtypes.
I avoided chained indexing and used .loc for assignments.
I used built-in aggregations for groupby.

Common mistakes and self-check

Using row-wise apply for arithmetic. Self-check: can this be column arithmetic or a NumPy ufunc?
Forgetting ddof when standardizing. Self-check: choose ddof=0 (population) or ddof=1 (sample) explicitly.
Blindly using inplace=True. Self-check: measure; prefer clear assignments.
Leaving object dtype for repeated strings. Self-check: is cardinality low? Convert to category.
Mismatched join dtypes causing slow merges. Self-check: compare df1[key].dtype vs df2[key].dtype.
Chained indexing leading to copies and warnings. Self-check: use df.loc[row_selector, col] = value.
Reading entire files when you need a few columns. Self-check: use usecols and chunksize.

Practical projects

Retail rollups: ingest a multi-million-row CSV of transactions in chunks, downcast dtypes, compute daily city-level totals, and write a compact parquet file.
Feature store preview: from a wide user events table, convert suitable strings to category, build 5 vectorized features (rates, rolling means), and validate memory impact.
Join optimization: join a 5M-row fact table to two small dimensions. Align dtypes, select only needed columns, and benchmark merge times before/after.

Learning path

Now: Performance Optimization and Memory (this page).
Next: Efficient joins, indexes, and time-series windows.
Then: Vectorized feature engineering patterns and scaling to out-of-core workflows.

Who this is for

Data Scientists and Analysts using pandas/NumPy who need faster pipelines and to fit larger datasets into memory without sacrificing correctness.

Prerequisites

Python basics: functions, loops, and comprehension (you’ll replace loops, but knowing them helps).
Comfort with pandas DataFrame/Series basics and NumPy arrays.
Basic statistics (mean, std) and aggregations.

Mini challenge

You receive a 12M-row clickstream CSV with columns: user_id (int), ts (ISO string), url (string), device (5 categories), dwell_sec (float), and referrer (mostly empty). Outline a plan to load, reduce memory, and compute per-user average dwell time by device.

Suggested approach

Read in chunks with usecols to skip referrer if unneeded; parse dates for ts only if required for grouping.
Downcast numerics (user_id to int32/int16 if safe; dwell_sec to float32).
Convert device to category; consider url to category only if very low cardinality (likely not) or avoid loading it entirely for this task.
Aggregate per chunk (groupby(['user_id','device']).agg({'dwell_sec':'mean','user_id':'size'}) or compute sum and count), then combine chunk results with groupby sum/count and final mean.
Write results to parquet.

Next steps

Take the quick test to lock in the concepts. Note: the test is available to everyone; only logged-in users will have their progress saved.

Menu

Performance Optimization And Memory

Table of Contents

Why this matters

Concept explained simply

Key techniques you will use

A reliable optimization workflow

Worked examples

Example 1 — Replace loops with vectorization

Example 2 — Reduce memory via dtypes

Example 3 — Faster groupby aggregations

Hands-on exercises

Exercise 1 — Vectorize and standardize

Exercise 2 — Reduce memory safely

Checklist before you move on

Common mistakes and self-check

Practical projects

Learning path

Who this is for

Prerequisites

Mini challenge

Next steps

Practice Exercises

Vectorize and standardize without loops

Instructions

Expected Output

Reduce memory by fixing dtypes

Performance Optimization And Memory — Quick Test

Have questions about Performance Optimization And Memory?

AI Assistant