luvv to helpDiscover the Best Free Online Tools
Topic 30 of 30

Interop With Numpy and Matplotlib

Learn Interop With Numpy and Matplotlib for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Why this matters

Pandas, NumPy, and Matplotlib are the core trio for everyday data analysis. As a Data Analyst, you will regularly:

  • Do fast numeric transforms (z-scores, percent changes, thresholds) with NumPy on pandas columns.
  • Convert between pandas and NumPy when libraries expect arrays.
  • Plot clean visuals quickly using pandas with Matplotlib under the hood, and customize with Matplotlib APIs.

Who this is for

  • Beginners who know basic pandas and want to compute faster with NumPy.
  • Analysts who can make basic plots but want to customize them cleanly.
  • Anyone preparing for analyst interviews involving vectorized operations and plotting.

Prerequisites

  • Python basics (variables, functions, importing modules).
  • Pandas basics (Series, DataFrame, indexing, selecting columns).
  • Very light Matplotlib familiarity (axes, labels) is helpful but not required.

Concept explained simply

Pandas stores tabular data and labels it with an index. NumPy powers fast numeric operations. Matplotlib draws the charts. They interoperate like this:

  • Use NumPy functions (like np.log, np.where, np.mean) directly on pandas Series/DataFrames. Pandas passes the data to NumPy efficiently.
  • Convert to NumPy when you need raw arrays using to_numpy(). Convert back to pandas with pd.Series(...) or pd.DataFrame(...) to regain labels.
  • Use df.plot(...) for quick charts. For fine control, get an Axes from Matplotlib and pass it to pandas: df.plot(ax=ax).

Mental model

  • Pandas = labeled containers + convenient API.
  • NumPy = speed engine for number crunching.
  • Matplotlib = canvas and brushes for drawing.

Move data between them when needed. Keep labels in pandas for alignment and readability; use NumPy for heavy math; draw with Matplotlib using axes.

Quick setup snippet

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Sample DataFrame
df = pd.DataFrame({
    'day': pd.date_range('2023-01-01', periods=10, freq='D'),
    'sales': [12, 15, 13, 20, 18, 17, 30, 28, 22, 25],
    'visits': [120, 130, 125, 150, 160, 158, 200, 195, 180, 190]
}).set_index('day')
print(df.head())

Worked examples

Example 1 — Vectorized transforms with NumPy

Compute z-scores for the sales column and flag unusually high values.

sales = df['sales']
mu = sales.mean()            # pandas Series method
sigma = sales.std(ddof=0)    # population std for demo
z = (sales - mu) / sigma     # vectorized math works directly

# Use NumPy to label outliers (> 1.0 std)
df['is_high'] = np.where(z > 1.0, 1, 0)
print(z.round(2))
print(df['is_high'].value_counts())

Key point: You didn’t have to convert to NumPy. Pandas interoperates with NumPy ufuncs and operators.

Example 2 — Converting to/from NumPy

When another library needs arrays, convert with to_numpy().

X = df[['sales', 'visits']].to_numpy(dtype=float)  # shape (n, 2)
print(X.shape, X.dtype)

# Later, convert results back to pandas while keeping index
y = (X[:, 0] / X[:, 1])  # conversion rate as raw ndarray
s_conversion = pd.Series(y, index=df.index, name='conv_rate')
df = pd.concat([df, s_conversion], axis=1)
print(df.head())

Tip: Prefer to_numpy() over .values because it respects dtypes more consistently.

Example 3 — Plotting with pandas + Matplotlib axes

Create a line chart with two y-axes: pandas handles data; Matplotlib handles layout.

fig, ax1 = plt.subplots(figsize=(7, 4))
ax2 = ax1.twinx()  # second y-axis

# Plot on specific axes for full control
_df1 = df[['sales']]
_df2 = df[['visits']]
_df1.plot(ax=ax1, color='tab:blue', marker='o', legend=False)
_df2.plot(ax=ax2, color='tab:orange', linestyle='--', legend=False)

ax1.set_title('Sales and Visits over Time')
ax1.set_xlabel('Day')
ax1.set_ylabel('Sales', color='tab:blue')
ax2.set_ylabel('Visits', color='tab:orange')
ax1.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Pattern: fig, ax = plt.subplots() → pass ax to df.plot() → customize using Matplotlib.

Common interop operations

  • Elementwise math: np.log(df['sales']), np.sqrt(df['visits']).
  • Conditional choice: np.where(cond, a, b) returns a vector; assign to a new column.
  • Row-wise functions: use vectorized NumPy patterns first; avoid apply(..., axis=1) if not necessary.
  • Aggregation ignoring NaN: np.nanmean(df['sales'].to_numpy()) or pandas df['sales'].mean().
  • Broadcasting: shapes must match. For 1D arrays, use length = number of rows.

Learning path

  1. Warm-up: Run the setup snippet and print shapes/dtypes.
  2. Vectorize: Replace any loops with NumPy ufuncs (np.log, np.where, np.clip).
  3. Convert safely: Practice to_numpy() and reconstruct labeled Series/DataFrames.
  4. Plot basics: Use df.plot() then move to Matplotlib axes for fine control.
  5. Polish: Add titles, labels, grids, legends, colors, twin axes.

Common mistakes and self-check

  • Using .values instead of to_numpy(): May give unexpected dtypes. Self-check: confirm arr.dtype after conversion.
  • Forgetting index alignment: Pandas aligns by index; NumPy ignores labels. Self-check: after converting to NumPy, verify shapes and ordering before assigning back.
  • Silent NaN propagation: np.log(0) → -inf, operations with NaN propagate NaN. Self-check: run np.isfinite() on results before plotting.
  • Plotting on default axes then customizing another axes: Your settings won’t apply. Self-check: always pass ax=... into df.plot() when customizing.

Self-checklist

  • I can compute a new column using only NumPy vectorized calls.
  • I can convert between pandas and NumPy without losing row order.
  • I can pass an existing Matplotlib Axes to df.plot().
  • I can handle NaN/inf before plotting.

Exercises

These mirror the interactive exercise(s) below. Do them in your Python environment.

Exercise 1 — Vectorize, convert, and plot
  1. Create a DataFrame with two numeric columns and a date index (10–20 rows).
  2. Compute a z-score on one column using NumPy; create a binary flag with np.where.
  3. Convert the selected numeric columns to a NumPy array with to_numpy(dtype=float).
  4. Back in pandas, create a ratio column from the NumPy result; plot a line chart and customize with Matplotlib axes.

Expected: z-scores printed, flag counts, and a customized chart rendered.

Practical projects

  • Marketing KPI dashboard (static): Build a DataFrame with daily spend, clicks, conversions. Use NumPy for CTR/CVR, clip outliers, and plot multi-axis trends with Matplotlib.
  • Quality control thresholds: Given measurements, compute rolling z-scores with NumPy, mark out-of-control points, and highlight them on a plot.
  • Anomaly tags: Use np.where and np.select to assign labels (normal, warn, alert) based on multiple conditions; visualize counts per day.

Mini challenge

Given columns revenue and visits, create:

  • rev_per_visit = revenue / visits (handle division by zero via np.where or np.divide with where).
  • tag = 'high', 'med', 'low' using np.select based on quantiles.
  • A dual-axis plot: rev_per_visit on left, visits on right, styled distinctly.

Next steps

  • Practice replacing loops with NumPy ufuncs and np.where/np.select.
  • Adopt a plotting pattern: fig, ax = plt.subplots() → df.plot(ax=ax) → customize.
  • Try a small project above and keep code snippets for reuse.

Progress & test

The quick test below is available to everyone. If you are logged in, your progress will be saved automatically.

Quick Test

When you are ready, take the quick test below to check your understanding.

Practice Exercises

1 exercises to complete

Instructions

  1. Create a DataFrame with 15 rows, a daily DatetimeIndex, and two columns: sales (integers) and visits (integers).
  2. Compute z-scores for sales using NumPy (mean and std). Add is_high = 1 if z > 1.0 else 0 using np.where.
  3. Convert ['sales','visits'] to a NumPy array via to_numpy(dtype=float). From that array, compute a conversion-like rate rate = sales / visits and bring it back as a labeled Series named rate, aligned to the original index.
  4. Plot sales (left y-axis) and visits (right y-axis) over time. Use fig, ax = plt.subplots(), pass ax to df.plot(), add title, labels, grid, and distinct colors.
Expected Output
Printed z-scores (rounded), value counts for is_high (e.g., 3 ones, 12 zeros), array shape (15, 2), head of df with new 'rate' column, and a chart with two y-axes.

Interop With Numpy and Matplotlib — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Interop With Numpy and Matplotlib?

AI Assistant

Ask questions about this tool