How to learn Creating and Transforming Columns Assign Where for Python pandas in Data Analyst for free

Who this is for

Beginner to intermediate data analysts who use pandas and want clean, readable ways to create new columns and apply conditional logic without loops.

Prerequisites

Basic Python syntax
pandas DataFrame basics (creating DataFrames, selecting columns)
Comfort with boolean logic (and: &, or: |, not: ~) and parentheses

Learning path

Review selecting columns and boolean filtering.
Learn DataFrame.assign to create multiple columns in a clean pipeline.
Learn where (Series/DataFrame.where) and numpy.where for conditional values.
Combine assign + where to build safe, readable transformations.
Practice on real-like datasets (sales, customers, sensors).

Why this matters

As a Data Analyst, you constantly create derived metrics and flags:

Sales metrics: revenue, discount, net revenue, margin, profit per order.
Customer insights: segment users by spend, recency, or behavior.
Data cleaning: replace out-of-range values with NaN or defaults.

Using assign and where lets you do this clearly, safely, and fast—no slow loops, fewer bugs, and code that teammates can understand.

Concept explained simply

Two building blocks:

assign: Adds new columns (or overwrites) and returns a new DataFrame. You can chain multiple new columns in one readable step.
where: Keeps values where a condition is True; replaces values where False (with NaN by default or a provided alternative).

numpy.where is a vectorized if-else: np.where(condition, value_if_true, value_if_false).

Mental model

assign: Think of a conveyor belt. Each assign step adds a labeled box (column) based on what’s already on the belt. With lambda, you reference the current DataFrame state inside the chain.
where: Think of a stencil over your data. Where the stencil is cut out (True), original values pass through. Where it’s covered (False), values get replaced.

Syntax snapshots

import pandas as pd
import numpy as np

# assign with existing columns
out = df.assign(new_col = df['a'] * df['b'])

# assign with lambda (recommended for chaining)
out = (
    df
    .assign(revenue=lambda d: d['qty'] * d['price'])
    .assign(high_value=lambda d: np.where(d['revenue'] >= 100, True, False))
)

# Series.where: keep values where condition is True, else replace
safe_ratio = (df['num'] / df['den']).where(df['den'] != 0, 0.0)

# DataFrame.where: apply across many columns (same rule per cell)
clean = df.where(df >= 0)  # negative values become NaN

Worked examples

Example 1: Sales revenue, discount, and net

import pandas as pd
import numpy as np

sales = pd.DataFrame({
    'item': ['A','B','C','D'],
    'qty': [3, 10, 0, 7],
    'price': [5.0, 8.0, 12.0, 3.0],
    'discount_rate': [0.00, 0.10, 0.00, 0.20]
})

result = (
    sales
    .assign(gross=lambda d: d['qty'] * d['price'])
    .assign(discount_value=lambda d: d['gross'] * d['discount_rate'])
    .assign(net=lambda d: d['gross'] - d['discount_value'])
)
print(result)

What you should see

  item  qty  price  discount_rate  gross  discount_value   net
0    A    3    5.0           0.00   15.0             0.0  15.0
1    B   10    8.0           0.10   80.0             8.0  72.0
2    C    0   12.0           0.00    0.0             0.0   0.0
3    D    7    3.0           0.20   21.0             4.2  16.8

Example 2: Flag high-value orders with np.where

flagged = (
    sales
    .assign(gross=lambda d: d['qty'] * d['price'])
    .assign(high_value=lambda d: np.where(d['gross'] >= 50, 'High', 'Regular'))
)
print(flagged[['item','gross','high_value']])

Expected output

  item  gross high_value
0    A   15.0    Regular
1    B   80.0       High
2    C    0.0    Regular
3    D   21.0    Regular

Example 3: Safe margin rate with where

profit_df = pd.DataFrame({
    'qty': [3, 10, 0, 7],
    'price': [5.0, 8.0, 12.0, 3.0],
    'profit': [5.0, 12.0, -2.0, 4.0]
})

with_margin = (
    profit_df
    .assign(revenue=lambda d: d['qty'] * d['price'])
    .assign(margin_rate=lambda d: (d['profit'] / d['revenue']).where(d['revenue'] > 0, 0.0))
)
print(with_margin)

Expected output

   qty  price  profit  revenue  margin_rate
0    3    5.0     5.0     15.0     0.333333
1   10    8.0    12.0     80.0     0.150000
2    0   12.0    -2.0      0.0     0.000000
3    7    3.0     4.0     21.0     0.190476

Exercises you can try now

These mirror the exercises below so you can check your answers. Everyone can attempt them. Progress is saved only for logged-in learners.

EX1 — Order totals with conditional discount
Build columns: gross, discount_value (apply coupon_pct if coupon is True, else 0), net, and is_big (net >= 100).
EX2 — Clean sensor temps and bucketize
Replace out-of-range temps with NaN; create temp_bucket: Low (<18), OK (18–25), High (>25).
EX3 — Safe margin
Create profit and margin; margin is profit/sales where sales > 0 else NaN.

Self-check checklist

Used assign with lambda for readable chaining.
Used where to avoid division by zero or to set defaults.
Used parentheses around boolean expressions with & and |.
Outputs match expected columns and shapes.

Common mistakes and how to self-check

Forgetting parentheses in boolean logic. Use (cond1) & (cond2) not cond1 & cond2 without brackets.
Mixing where meaning: remember where keeps values when condition is True; replaces when False.
Not importing numpy for np.where.
Division by zero: always guard with where before dividing, or compute then where.
Chained assignment warning: prefer df = df.assign(...) rather than df['x'] ... inside many steps.
Type surprises: np.where with mixed types can upcast. Be explicit if you need ints/floats.

Practical projects

Sales analysis mini-report: compute gross, net, discount flags, high-value segments, and safe margins for a month of orders.
Customer segmentation: assign spend tiers (e.g., Bronze/Silver/Gold) using np.where or np.select, plus recency flags.
Data quality pass: use where to null-out invalid ranges (negative ages, impossible readings) and produce a clean dataset.

Next steps

Combine your new columns with groupby to aggregate by segment or date.
Use vectorized string/date methods to enrich columns (parsing dates, extracting domains).
Package transformations in functions and apply with pipe for reusable pipelines.

Mini challenge

Create columns for a small product dataset:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'sku': ['A','B','C','D','E'],
    'price': [20, 35, 12, 50, 40],
    'cost': [12, 20, 14, 45, 38],
    'units': [5, 3, 10, 0, 1]
})

Build revenue, profit, margin_rate (profit/revenue where revenue>0 else 0).
Classify health: 'Healthy' if margin_rate >= 0.25 and revenue >= 100, 'Low margin' if margin_rate < 0.10 and revenue > 0, else 'OK'.

Show one possible solution

sol = (
    df
    .assign(revenue=lambda d: d['price'] * d['units'])
    .assign(profit=lambda d: (d['price'] - d['cost']) * d['units'])
    .assign(margin_rate=lambda d: (d['profit'] / d['revenue']).where(d['revenue'] > 0, 0.0))
    .assign(health=lambda d: np.where((d['margin_rate'] >= 0.25) & (d['revenue'] >= 100),
                                      'Healthy',
                                      np.where((d['margin_rate'] < 0.10) & (d['revenue'] > 0),
                                               'Low margin',
                                               'OK')))
)
print(sol)

Menu

Creating and Transforming Columns Assign Where

Table of Contents

Who this is for

Prerequisites

Learning path

Why this matters

Concept explained simply

Mental model

Syntax snapshots

Worked examples

Example 1: Sales revenue, discount, and net

Example 2: Flag high-value orders with np.where

Example 3: Safe margin rate with where

Exercises you can try now

Common mistakes and how to self-check

Practical projects

Next steps

Mini challenge

Practice Exercises

Order totals with conditional discount

Instructions

Expected Output

Clean sensor temps and bucketize

Safe margin with where

Creating and Transforming Columns Assign Where — Quick Test

Have questions about Creating and Transforming Columns Assign Where?

AI Assistant