Why this matters
As a Data Analyst, you constantly slice DataFrames to answer questions fast. Boolean filtering and the .query method let you turn business questions into precise row selections, such as:
- Which customers purchased more than 3 items and are from the West region?
- Which marketing channels performed above median revenue last quarter?
- Which SKUs to exclude due to test flags or missing attributes?
These are daily tasks when exploring data, cleaning it, and building dashboards or analyses.
Concept explained simply
In pandas, filtering is done by building a boolean mask: a Series of True/False values aligned with your DataFrame rows. You pass this mask to df[mask] to keep only rows where mask is True.
- Comparison operators create masks:
df['revenue'] > 10000 - Combine masks with:
&(and),|(or),~(not). Always wrap each condition in parentheses. - Helpful helpers:
.isin(list),.between(a, b),.isna(),.notna(),.str.contains() - .query lets you write filters as a string:
df.query("revenue > 10000 and region == 'West'"). Use@varto reference Python variables inside the query string.
Mental model
Think of a transparent sheet over your data with green checks (True) and red crosses (False) on each row. When you apply the mask, pandas keeps only the green-checked rows. Combining masks is like layering rules: keep it if all green (and), keep it if any green (or), invert colors (not).
Worked examples
Example 1 — Multiple conditions with & and |
import pandas as pd
df = pd.DataFrame({
'region': ['West','East','West','South','West','North','West','East'],
'channel': ['Online','Retail','Retail','Online','Online','Retail','Online','Online'],
'revenue': [12000,8000,15000,4000,9500,22000,11000,13000]
})
mask = (df['region'] == 'West') & (df['revenue'] > 10000)
df_filtered = df[mask]
print(df_filtered)
This returns only West rows with revenue > 10000.
Example 2 — .isin, .between, and ~ (not)
countries = pd.Series(['US','CA','MX','US','MX','US','US','CA'], name='country')
price = pd.Series([199,89,249,29,79,299,59,149], name='price')
mask_country = countries.isin(['US','CA'])
mask_price = price.between(50, 200)
result_mask = mask_country & mask_price
print(result_mask)
Use ~mask to invert. For example, to exclude certain countries: ~countries.isin(['MX']).
Example 3 — Text filters with .str.contains
products = pd.Series(['Pro Chair','Basic Desk','Pro Desk','Mini Lamp','Basic Chair','Pro Monitor','Pro Lamp','Basic Monitor'])
mask = products.str.contains('pro', case=False, na=False)
print(products[mask])
na=False avoids errors on missing values by treating them as False.
Example 4 — Using .query with variables
import pandas as pd
df = pd.DataFrame({
'region': ['West','East','West','South','West','North','West','East'],
'channel': ['Online','Retail','Retail','Online','Online','Retail','Online','Online'],
'revenue': [12000,8000,15000,4000,9500,22000,11000,13000],
'country': ['US','US','CA','US','MX','US','US','CA']
})
hi = 10000
allowed = ['US','CA']
res = df.query("revenue > @hi and country in @allowed")
print(res)
Inside .query, reference Python variables with @name. Use simple comparisons, logical operators, and in for membership.
Exercises to practice
Use this starter DataFrame for both exercises:
import pandas as pd
data = [
{"order_id":1,"region":"West","channel":"Online","revenue":12000,"units":3,"product":"Pro Chair","country":"US","price":199.0},
{"order_id":2,"region":"East","channel":"Retail","revenue":8000,"units":5,"product":"Basic Desk","country":"US","price":89.0},
{"order_id":3,"region":"West","channel":"Retail","revenue":15000,"units":2,"product":"Pro Desk","country":"CA","price":249.0},
{"order_id":4,"region":"South","channel":"Online","revenue":4000,"units":1,"product":"Mini Lamp","country":"US","price":29.0},
{"order_id":5,"region":"West","channel":"Online","revenue":9500,"units":7,"product":"Basic Chair","country":"MX","price":79.0},
{"order_id":6,"region":"North","channel":"Retail","revenue":22000,"units":4,"product":"Pro Monitor","country":"US","price":299.0},
{"order_id":7,"region":"West","channel":"Online","revenue":11000,"units":6,"product":"Pro Lamp","country":"US","price":59.0},
{"order_id":8,"region":"East","channel":"Online","revenue":13000,"units":2,"product":"Basic Monitor","country":"CA","price":149.0}
]
df = pd.DataFrame(data)
Exercise 1 — Filter with multiple conditions (boolean masks)
Return rows where region == 'West' and revenue > 10000. Show only columns order_id and revenue.
Hints
- Build two masks:
(df['region'] == 'West')and(df['revenue'] > 10000) - Combine with
&and apply:df[combined][['order_id','revenue']]
- Checklist: Did you wrap each condition in parentheses?
- Checklist: Did you avoid using
and/orinstead of&/|?
Exercise 2 — Use .query with variables and exclude by text
Create a Python list countries = ['US','CA']. Using .query, return order_id values where country in countries and (channel == 'Online' or revenue >= 20000). Exclude any rows whose product contains Basic (case-insensitive). You may use an external mask for the text filter and reference it inside .query with @.
Hints
- Reference variables inside
.querywith@, e.g.,country in @countries - For text exclusion, build
mask_basic = df['product'].str.contains('Basic', case=False, na=False)and useand ~@mask_basicin.query
- Checklist: Did you correctly combine
orandandlogic with parentheses inside.query? - Checklist: Did you exclude Basic products using
~?
When you are ready, compare with the solutions in the Exercises section below on this page. The quick test at the end is available to everyone; only logged-in learners will have their progress saved.
Common mistakes and how to self-check
1) Using and/or instead of & and |
pandas requires bitwise operators for elementwise comparisons. Fix by replacing and/or with &/| and wrap each condition in parentheses.
2) Missing parentheses around conditions
Operator precedence can lead to incorrect results. Always write (A) & (B) or (A) | (B).
3) .str.contains on columns with NaN
Use na=False to avoid errors or unexpected NaN results: df['col'].str.contains('x', na=False).
4) Confusion with inclusive ranges
Use .between(low, high, inclusive='both') for inclusive ranges. Default includes both ends in recent pandas versions.
5) .query not seeing Python variables
Remember to reference them with @. Example: df.query("country in @allowed").
Self-check tip: After filtering, quickly validate counts with mask.sum() and sanity-check with a small head/tail sample.
Mini challenge
Using the starter DataFrame, produce the order_ids that meet all conditions:
- Region is West or North
- Revenue between 10,000 and 22,000 inclusive
- Product name contains "Pro" (case-insensitive)
- Exclude any rows with units < 3
Hint
Combine .between, .str.contains(..., case=False, na=False), and logical operators with parentheses. Remember ~ for negation.
Who this is for
- Aspiring and junior Data Analysts
- Anyone who can load data into pandas and wants faster, cleaner filters
Prerequisites
- Basic Python (variables, lists, functions)
- Intro pandas (DataFrame, Series, selecting columns)
Learning path
- Start: Column selection and basic comparisons
- Then: Boolean masks,
&,|,~,.isin,.between - Next: Text filters with
.straccessors - Then:
.querywith variables (@) - Finally: Combine filters for complex business rules
Practical projects
- Sales slice tool: Build a script that filters a sales DataFrame by region, price band, and product family, exporting results to CSV.
- Quality checks: Create a filter report that flags rows with invalid ranges (e.g., negative prices, missing country) and saves counts and examples.
- Marketing shortlist: From campaign results, filter high-CTR and low-CPC ads per channel and produce a top-10 list per region.
Next steps
- Finish the exercises below and take the quick test to confirm mastery.
- Apply filters to a dataset you use at work or in a portfolio project.
- Practice writing both boolean-mask style and
.querystyle for the same logic.