luvv to helpDiscover the Best Free Online Tools
Topic 7 of 30

Boolean Filtering and Query

Learn Boolean Filtering and Query for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Why this matters

As a Data Analyst, you constantly slice DataFrames to answer questions fast. Boolean filtering and the .query method let you turn business questions into precise row selections, such as:

  • Which customers purchased more than 3 items and are from the West region?
  • Which marketing channels performed above median revenue last quarter?
  • Which SKUs to exclude due to test flags or missing attributes?

These are daily tasks when exploring data, cleaning it, and building dashboards or analyses.

Concept explained simply

In pandas, filtering is done by building a boolean mask: a Series of True/False values aligned with your DataFrame rows. You pass this mask to df[mask] to keep only rows where mask is True.

  • Comparison operators create masks: df['revenue'] > 10000
  • Combine masks with: & (and), | (or), ~ (not). Always wrap each condition in parentheses.
  • Helpful helpers: .isin(list), .between(a, b), .isna(), .notna(), .str.contains()
  • .query lets you write filters as a string: df.query("revenue > 10000 and region == 'West'"). Use @var to reference Python variables inside the query string.
Mental model

Think of a transparent sheet over your data with green checks (True) and red crosses (False) on each row. When you apply the mask, pandas keeps only the green-checked rows. Combining masks is like layering rules: keep it if all green (and), keep it if any green (or), invert colors (not).

Worked examples

Example 1 — Multiple conditions with & and |
import pandas as pd

df = pd.DataFrame({
    'region': ['West','East','West','South','West','North','West','East'],
    'channel': ['Online','Retail','Retail','Online','Online','Retail','Online','Online'],
    'revenue': [12000,8000,15000,4000,9500,22000,11000,13000]
})

mask = (df['region'] == 'West') & (df['revenue'] > 10000)
df_filtered = df[mask]
print(df_filtered)

This returns only West rows with revenue > 10000.

Example 2 — .isin, .between, and ~ (not)
countries = pd.Series(['US','CA','MX','US','MX','US','US','CA'], name='country')
price = pd.Series([199,89,249,29,79,299,59,149], name='price')

mask_country = countries.isin(['US','CA'])
mask_price = price.between(50, 200)
result_mask = mask_country & mask_price
print(result_mask)

Use ~mask to invert. For example, to exclude certain countries: ~countries.isin(['MX']).

Example 3 — Text filters with .str.contains
products = pd.Series(['Pro Chair','Basic Desk','Pro Desk','Mini Lamp','Basic Chair','Pro Monitor','Pro Lamp','Basic Monitor'])
mask = products.str.contains('pro', case=False, na=False)
print(products[mask])

na=False avoids errors on missing values by treating them as False.

Example 4 — Using .query with variables
import pandas as pd

df = pd.DataFrame({
    'region': ['West','East','West','South','West','North','West','East'],
    'channel': ['Online','Retail','Retail','Online','Online','Retail','Online','Online'],
    'revenue': [12000,8000,15000,4000,9500,22000,11000,13000],
    'country': ['US','US','CA','US','MX','US','US','CA']
})

hi = 10000
allowed = ['US','CA']
res = df.query("revenue > @hi and country in @allowed")
print(res)

Inside .query, reference Python variables with @name. Use simple comparisons, logical operators, and in for membership.

Exercises to practice

Use this starter DataFrame for both exercises:

import pandas as pd

data = [
 {"order_id":1,"region":"West","channel":"Online","revenue":12000,"units":3,"product":"Pro Chair","country":"US","price":199.0},
 {"order_id":2,"region":"East","channel":"Retail","revenue":8000,"units":5,"product":"Basic Desk","country":"US","price":89.0},
 {"order_id":3,"region":"West","channel":"Retail","revenue":15000,"units":2,"product":"Pro Desk","country":"CA","price":249.0},
 {"order_id":4,"region":"South","channel":"Online","revenue":4000,"units":1,"product":"Mini Lamp","country":"US","price":29.0},
 {"order_id":5,"region":"West","channel":"Online","revenue":9500,"units":7,"product":"Basic Chair","country":"MX","price":79.0},
 {"order_id":6,"region":"North","channel":"Retail","revenue":22000,"units":4,"product":"Pro Monitor","country":"US","price":299.0},
 {"order_id":7,"region":"West","channel":"Online","revenue":11000,"units":6,"product":"Pro Lamp","country":"US","price":59.0},
 {"order_id":8,"region":"East","channel":"Online","revenue":13000,"units":2,"product":"Basic Monitor","country":"CA","price":149.0}
]

df = pd.DataFrame(data)

Exercise 1 — Filter with multiple conditions (boolean masks)

Return rows where region == 'West' and revenue > 10000. Show only columns order_id and revenue.

Hints
  • Build two masks: (df['region'] == 'West') and (df['revenue'] > 10000)
  • Combine with & and apply: df[combined][['order_id','revenue']]
  • Checklist: Did you wrap each condition in parentheses?
  • Checklist: Did you avoid using and/or instead of &/|?

Exercise 2 — Use .query with variables and exclude by text

Create a Python list countries = ['US','CA']. Using .query, return order_id values where country in countries and (channel == 'Online' or revenue >= 20000). Exclude any rows whose product contains Basic (case-insensitive). You may use an external mask for the text filter and reference it inside .query with @.

Hints
  • Reference variables inside .query with @, e.g., country in @countries
  • For text exclusion, build mask_basic = df['product'].str.contains('Basic', case=False, na=False) and use and ~@mask_basic in .query
  • Checklist: Did you correctly combine or and and logic with parentheses inside .query?
  • Checklist: Did you exclude Basic products using ~?

When you are ready, compare with the solutions in the Exercises section below on this page. The quick test at the end is available to everyone; only logged-in learners will have their progress saved.

Common mistakes and how to self-check

1) Using and/or instead of & and |

pandas requires bitwise operators for elementwise comparisons. Fix by replacing and/or with &/| and wrap each condition in parentheses.

2) Missing parentheses around conditions

Operator precedence can lead to incorrect results. Always write (A) & (B) or (A) | (B).

3) .str.contains on columns with NaN

Use na=False to avoid errors or unexpected NaN results: df['col'].str.contains('x', na=False).

4) Confusion with inclusive ranges

Use .between(low, high, inclusive='both') for inclusive ranges. Default includes both ends in recent pandas versions.

5) .query not seeing Python variables

Remember to reference them with @. Example: df.query("country in @allowed").

Self-check tip: After filtering, quickly validate counts with mask.sum() and sanity-check with a small head/tail sample.

Mini challenge

Using the starter DataFrame, produce the order_ids that meet all conditions:

  • Region is West or North
  • Revenue between 10,000 and 22,000 inclusive
  • Product name contains "Pro" (case-insensitive)
  • Exclude any rows with units < 3
Hint

Combine .between, .str.contains(..., case=False, na=False), and logical operators with parentheses. Remember ~ for negation.

Who this is for

  • Aspiring and junior Data Analysts
  • Anyone who can load data into pandas and wants faster, cleaner filters

Prerequisites

  • Basic Python (variables, lists, functions)
  • Intro pandas (DataFrame, Series, selecting columns)

Learning path

  • Start: Column selection and basic comparisons
  • Then: Boolean masks, &, |, ~, .isin, .between
  • Next: Text filters with .str accessors
  • Then: .query with variables (@)
  • Finally: Combine filters for complex business rules

Practical projects

  • Sales slice tool: Build a script that filters a sales DataFrame by region, price band, and product family, exporting results to CSV.
  • Quality checks: Create a filter report that flags rows with invalid ranges (e.g., negative prices, missing country) and saves counts and examples.
  • Marketing shortlist: From campaign results, filter high-CTR and low-CPC ads per channel and produce a top-10 list per region.

Next steps

  • Finish the exercises below and take the quick test to confirm mastery.
  • Apply filters to a dataset you use at work or in a portfolio project.
  • Practice writing both boolean-mask style and .query style for the same logic.

Practice Exercises

2 exercises to complete

Instructions

Using the provided DataFrame df (see lesson above), return rows where region == 'West' and revenue > 10000. Show only order_id and revenue.

Expected Output
Rows with order_id [1, 3, 7] and their revenue values.

Boolean Filtering and Query — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Boolean Filtering and Query?

AI Assistant

Ask questions about this tool