luvv to helpDiscover the Best Free Online Tools
Topic 13 of 30

Creating and Transforming Columns Assign Where

Learn Creating and Transforming Columns Assign Where for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Who this is for

Beginner to intermediate data analysts who use pandas and want clean, readable ways to create new columns and apply conditional logic without loops.

Prerequisites

  • Basic Python syntax
  • pandas DataFrame basics (creating DataFrames, selecting columns)
  • Comfort with boolean logic (and: &, or: |, not: ~) and parentheses

Learning path

  1. Review selecting columns and boolean filtering.
  2. Learn DataFrame.assign to create multiple columns in a clean pipeline.
  3. Learn where (Series/DataFrame.where) and numpy.where for conditional values.
  4. Combine assign + where to build safe, readable transformations.
  5. Practice on real-like datasets (sales, customers, sensors).

Why this matters

As a Data Analyst, you constantly create derived metrics and flags:

  • Sales metrics: revenue, discount, net revenue, margin, profit per order.
  • Customer insights: segment users by spend, recency, or behavior.
  • Data cleaning: replace out-of-range values with NaN or defaults.

Using assign and where lets you do this clearly, safely, and fast—no slow loops, fewer bugs, and code that teammates can understand.

Concept explained simply

Two building blocks:

  • assign: Adds new columns (or overwrites) and returns a new DataFrame. You can chain multiple new columns in one readable step.
  • where: Keeps values where a condition is True; replaces values where False (with NaN by default or a provided alternative).

numpy.where is a vectorized if-else: np.where(condition, value_if_true, value_if_false).

Mental model

  • assign: Think of a conveyor belt. Each assign step adds a labeled box (column) based on what’s already on the belt. With lambda, you reference the current DataFrame state inside the chain.
  • where: Think of a stencil over your data. Where the stencil is cut out (True), original values pass through. Where it’s covered (False), values get replaced.

Syntax snapshots

import pandas as pd
import numpy as np

# assign with existing columns
out = df.assign(new_col = df['a'] * df['b'])

# assign with lambda (recommended for chaining)
out = (
    df
    .assign(revenue=lambda d: d['qty'] * d['price'])
    .assign(high_value=lambda d: np.where(d['revenue'] >= 100, True, False))
)

# Series.where: keep values where condition is True, else replace
safe_ratio = (df['num'] / df['den']).where(df['den'] != 0, 0.0)

# DataFrame.where: apply across many columns (same rule per cell)
clean = df.where(df >= 0)  # negative values become NaN

Worked examples

Example 1: Sales revenue, discount, and net

import pandas as pd
import numpy as np

sales = pd.DataFrame({
    'item': ['A','B','C','D'],
    'qty': [3, 10, 0, 7],
    'price': [5.0, 8.0, 12.0, 3.0],
    'discount_rate': [0.00, 0.10, 0.00, 0.20]
})

result = (
    sales
    .assign(gross=lambda d: d['qty'] * d['price'])
    .assign(discount_value=lambda d: d['gross'] * d['discount_rate'])
    .assign(net=lambda d: d['gross'] - d['discount_value'])
)
print(result)
What you should see
  item  qty  price  discount_rate  gross  discount_value   net
0    A    3    5.0           0.00   15.0             0.0  15.0
1    B   10    8.0           0.10   80.0             8.0  72.0
2    C    0   12.0           0.00    0.0             0.0   0.0
3    D    7    3.0           0.20   21.0             4.2  16.8

Example 2: Flag high-value orders with np.where

flagged = (
    sales
    .assign(gross=lambda d: d['qty'] * d['price'])
    .assign(high_value=lambda d: np.where(d['gross'] >= 50, 'High', 'Regular'))
)
print(flagged[['item','gross','high_value']])
Expected output
  item  gross high_value
0    A   15.0    Regular
1    B   80.0       High
2    C    0.0    Regular
3    D   21.0    Regular

Example 3: Safe margin rate with where

profit_df = pd.DataFrame({
    'qty': [3, 10, 0, 7],
    'price': [5.0, 8.0, 12.0, 3.0],
    'profit': [5.0, 12.0, -2.0, 4.0]
})

with_margin = (
    profit_df
    .assign(revenue=lambda d: d['qty'] * d['price'])
    .assign(margin_rate=lambda d: (d['profit'] / d['revenue']).where(d['revenue'] > 0, 0.0))
)
print(with_margin)
Expected output
   qty  price  profit  revenue  margin_rate
0    3    5.0     5.0     15.0     0.333333
1   10    8.0    12.0     80.0     0.150000
2    0   12.0    -2.0      0.0     0.000000
3    7    3.0     4.0     21.0     0.190476

Exercises you can try now

These mirror the exercises below so you can check your answers. Everyone can attempt them. Progress is saved only for logged-in learners.

  1. EX1 — Order totals with conditional discount
    Build columns: gross, discount_value (apply coupon_pct if coupon is True, else 0), net, and is_big (net >= 100).
  2. EX2 — Clean sensor temps and bucketize
    Replace out-of-range temps with NaN; create temp_bucket: Low (<18), OK (18–25), High (>25).
  3. EX3 — Safe margin
    Create profit and margin; margin is profit/sales where sales > 0 else NaN.
Self-check checklist
  • Used assign with lambda for readable chaining.
  • Used where to avoid division by zero or to set defaults.
  • Used parentheses around boolean expressions with & and |.
  • Outputs match expected columns and shapes.

Common mistakes and how to self-check

  • Forgetting parentheses in boolean logic. Use (cond1) & (cond2) not cond1 & cond2 without brackets.
  • Mixing where meaning: remember where keeps values when condition is True; replaces when False.
  • Not importing numpy for np.where.
  • Division by zero: always guard with where before dividing, or compute then where.
  • Chained assignment warning: prefer df = df.assign(...) rather than df['x'] ... inside many steps.
  • Type surprises: np.where with mixed types can upcast. Be explicit if you need ints/floats.

Practical projects

  • Sales analysis mini-report: compute gross, net, discount flags, high-value segments, and safe margins for a month of orders.
  • Customer segmentation: assign spend tiers (e.g., Bronze/Silver/Gold) using np.where or np.select, plus recency flags.
  • Data quality pass: use where to null-out invalid ranges (negative ages, impossible readings) and produce a clean dataset.

Next steps

  • Combine your new columns with groupby to aggregate by segment or date.
  • Use vectorized string/date methods to enrich columns (parsing dates, extracting domains).
  • Package transformations in functions and apply with pipe for reusable pipelines.

Mini challenge

Create columns for a small product dataset:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'sku': ['A','B','C','D','E'],
    'price': [20, 35, 12, 50, 40],
    'cost': [12, 20, 14, 45, 38],
    'units': [5, 3, 10, 0, 1]
})
  • Build revenue, profit, margin_rate (profit/revenue where revenue>0 else 0).
  • Classify health: 'Healthy' if margin_rate >= 0.25 and revenue >= 100, 'Low margin' if margin_rate < 0.10 and revenue > 0, else 'OK'.
Show one possible solution
sol = (
    df
    .assign(revenue=lambda d: d['price'] * d['units'])
    .assign(profit=lambda d: (d['price'] - d['cost']) * d['units'])
    .assign(margin_rate=lambda d: (d['profit'] / d['revenue']).where(d['revenue'] > 0, 0.0))
    .assign(health=lambda d: np.where((d['margin_rate'] >= 0.25) & (d['revenue'] >= 100),
                                      'Healthy',
                                      np.where((d['margin_rate'] < 0.10) & (d['revenue'] > 0),
                                               'Low margin',
                                               'OK')))
)
print(sol)

Practice Exercises

3 exercises to complete

Instructions

Create a DataFrame and compute order totals using assign and where.

import pandas as pd
import numpy as np

orders = pd.DataFrame({
    'order_id': [101, 102, 103, 104],
    'items': [4, 12, 1, 9],
    'unit_price': [12.5, 9.0, 100.0, 7.5],
    'coupon': [True, False, True, False],
    'coupon_pct': [0.15, 0.0, 0.20, 0.0]
})

# TODO:
# 1) gross = items * unit_price
# 2) discount_value = gross * coupon_pct when coupon is True, else 0
# 3) net = gross - discount_value
# 4) is_big = True if net >= 100 else False
Expected Output
order_id items unit_price coupon coupon_pct gross discount_value net is_big 0 101 4 12.5 True 0.15 50.0 7.5 42.5 False 1 102 12 9.0 False 0.00 108.0 0.0 108.0 True 2 103 1 100.0 True 0.20 100.0 20.0 80.0 False 3 104 9 7.5 False 0.00 67.5 0.0 67.5 False

Creating and Transforming Columns Assign Where — Quick Test

Test your knowledge with 6 questions. Pass with 70% or higher.

6 questions70% to pass

Have questions about Creating and Transforming Columns Assign Where?

AI Assistant

Ask questions about this tool