How to learn Boolean Filtering and Query for Python pandas in Data Analyst for free

Why this matters

As a Data Analyst, you constantly slice DataFrames to answer questions fast. Boolean filtering and the .query method let you turn business questions into precise row selections, such as:

Which customers purchased more than 3 items and are from the West region?
Which marketing channels performed above median revenue last quarter?
Which SKUs to exclude due to test flags or missing attributes?

These are daily tasks when exploring data, cleaning it, and building dashboards or analyses.

Concept explained simply

In pandas, filtering is done by building a boolean mask: a Series of True/False values aligned with your DataFrame rows. You pass this mask to df[mask] to keep only rows where mask is True.

Comparison operators create masks: df['revenue'] > 10000
Combine masks with: & (and), | (or), ~ (not). Always wrap each condition in parentheses.
Helpful helpers: .isin(list), .between(a, b), .isna(), .notna(), .str.contains()
.query lets you write filters as a string: df.query("revenue > 10000 and region == 'West'"). Use @var to reference Python variables inside the query string.

Mental model

Think of a transparent sheet over your data with green checks (True) and red crosses (False) on each row. When you apply the mask, pandas keeps only the green-checked rows. Combining masks is like layering rules: keep it if all green (and), keep it if any green (or), invert colors (not).

Worked examples

Example 1 — Multiple conditions with & and |

import pandas as pd

df = pd.DataFrame({
    'region': ['West','East','West','South','West','North','West','East'],
    'channel': ['Online','Retail','Retail','Online','Online','Retail','Online','Online'],
    'revenue': [12000,8000,15000,4000,9500,22000,11000,13000]
})

mask = (df['region'] == 'West') & (df['revenue'] > 10000)
df_filtered = df[mask]
print(df_filtered)

This returns only West rows with revenue > 10000.

Example 2 — .isin, .between, and ~ (not)

countries = pd.Series(['US','CA','MX','US','MX','US','US','CA'], name='country')
price = pd.Series([199,89,249,29,79,299,59,149], name='price')

mask_country = countries.isin(['US','CA'])
mask_price = price.between(50, 200)
result_mask = mask_country & mask_price
print(result_mask)

Use ~mask to invert. For example, to exclude certain countries: ~countries.isin(['MX']).

Example 3 — Text filters with .str.contains

products = pd.Series(['Pro Chair','Basic Desk','Pro Desk','Mini Lamp','Basic Chair','Pro Monitor','Pro Lamp','Basic Monitor'])
mask = products.str.contains('pro', case=False, na=False)
print(products[mask])

na=False avoids errors on missing values by treating them as False.

Example 4 — Using .query with variables

import pandas as pd

df = pd.DataFrame({
    'region': ['West','East','West','South','West','North','West','East'],
    'channel': ['Online','Retail','Retail','Online','Online','Retail','Online','Online'],
    'revenue': [12000,8000,15000,4000,9500,22000,11000,13000],
    'country': ['US','US','CA','US','MX','US','US','CA']
})

hi = 10000
allowed = ['US','CA']
res = df.query("revenue > @hi and country in @allowed")
print(res)

Inside .query, reference Python variables with @name. Use simple comparisons, logical operators, and in for membership.

Exercises to practice

Use this starter DataFrame for both exercises:

import pandas as pd

data = [
 {"order_id":1,"region":"West","channel":"Online","revenue":12000,"units":3,"product":"Pro Chair","country":"US","price":199.0},
 {"order_id":2,"region":"East","channel":"Retail","revenue":8000,"units":5,"product":"Basic Desk","country":"US","price":89.0},
 {"order_id":3,"region":"West","channel":"Retail","revenue":15000,"units":2,"product":"Pro Desk","country":"CA","price":249.0},
 {"order_id":4,"region":"South","channel":"Online","revenue":4000,"units":1,"product":"Mini Lamp","country":"US","price":29.0},
 {"order_id":5,"region":"West","channel":"Online","revenue":9500,"units":7,"product":"Basic Chair","country":"MX","price":79.0},
 {"order_id":6,"region":"North","channel":"Retail","revenue":22000,"units":4,"product":"Pro Monitor","country":"US","price":299.0},
 {"order_id":7,"region":"West","channel":"Online","revenue":11000,"units":6,"product":"Pro Lamp","country":"US","price":59.0},
 {"order_id":8,"region":"East","channel":"Online","revenue":13000,"units":2,"product":"Basic Monitor","country":"CA","price":149.0}
]

df = pd.DataFrame(data)

Exercise 1 — Filter with multiple conditions (boolean masks)

Return rows where region == 'West' and revenue > 10000. Show only columns order_id and revenue.

Hints

Build two masks: (df['region'] == 'West') and (df['revenue'] > 10000)
Combine with & and apply: df[combined][['order_id','revenue']]

Checklist: Did you wrap each condition in parentheses?
Checklist: Did you avoid using and/or instead of &/|?

Exercise 2 — Use .query with variables and exclude by text

Create a Python list countries = ['US','CA']. Using .query, return order_id values where country in countries and (channel == 'Online' or revenue >= 20000). Exclude any rows whose product contains Basic (case-insensitive). You may use an external mask for the text filter and reference it inside .query with @.

Hints

Reference variables inside .query with @, e.g., country in @countries
For text exclusion, build mask_basic = df['product'].str.contains('Basic', case=False, na=False) and use and ~@mask_basic in .query

Checklist: Did you correctly combine or and and logic with parentheses inside .query?
Checklist: Did you exclude Basic products using ~?

When you are ready, compare with the solutions in the Exercises section below on this page. The quick test at the end is available to everyone; only logged-in learners will have their progress saved.

Common mistakes and how to self-check

1) Using and/or instead of & and |

pandas requires bitwise operators for elementwise comparisons. Fix by replacing and/or with &/| and wrap each condition in parentheses.

2) Missing parentheses around conditions

Operator precedence can lead to incorrect results. Always write (A) & (B) or (A) | (B).

3) .str.contains on columns with NaN

Use na=False to avoid errors or unexpected NaN results: df['col'].str.contains('x', na=False).

4) Confusion with inclusive ranges

Use .between(low, high, inclusive='both') for inclusive ranges. Default includes both ends in recent pandas versions.

5) .query not seeing Python variables

Remember to reference them with @. Example: df.query("country in @allowed").

Self-check tip: After filtering, quickly validate counts with mask.sum() and sanity-check with a small head/tail sample.

Mini challenge

Using the starter DataFrame, produce the order_ids that meet all conditions:

Region is West or North
Revenue between 10,000 and 22,000 inclusive
Product name contains "Pro" (case-insensitive)
Exclude any rows with units < 3

Hint

Combine .between, .str.contains(..., case=False, na=False), and logical operators with parentheses. Remember ~ for negation.

Who this is for

Aspiring and junior Data Analysts
Anyone who can load data into pandas and wants faster, cleaner filters

Prerequisites

Basic Python (variables, lists, functions)
Intro pandas (DataFrame, Series, selecting columns)

Learning path

Start: Column selection and basic comparisons
Then: Boolean masks, &, |, ~, .isin, .between
Next: Text filters with .str accessors
Then: .query with variables (@)
Finally: Combine filters for complex business rules

Practical projects

Sales slice tool: Build a script that filters a sales DataFrame by region, price band, and product family, exporting results to CSV.
Quality checks: Create a filter report that flags rows with invalid ranges (e.g., negative prices, missing country) and saves counts and examples.
Marketing shortlist: From campaign results, filter high-CTR and low-CPC ads per channel and produce a top-10 list per region.

Next steps

Finish the exercises below and take the quick test to confirm mastery.
Apply filters to a dataset you use at work or in a portfolio project.
Practice writing both boolean-mask style and .query style for the same logic.

Menu

Boolean Filtering and Query

Table of Contents

Why this matters

Concept explained simply

Worked examples

Exercises to practice

Exercise 1 — Filter with multiple conditions (boolean masks)

Exercise 2 — Use .query with variables and exclude by text

Common mistakes and how to self-check

Mini challenge

Who this is for

Prerequisites

Learning path

Practical projects

Next steps

Practice Exercises

Filter with multiple conditions (boolean masks)

Instructions

Expected Output

Use .query with variables and exclude by text

Boolean Filtering and Query — Quick Test

Have questions about Boolean Filtering and Query?

AI Assistant