How to learn Interop With Numpy and Matplotlib for Python pandas in Data Analyst for free

Why this matters

Pandas, NumPy, and Matplotlib are the core trio for everyday data analysis. As a Data Analyst, you will regularly:

Do fast numeric transforms (z-scores, percent changes, thresholds) with NumPy on pandas columns.
Convert between pandas and NumPy when libraries expect arrays.
Plot clean visuals quickly using pandas with Matplotlib under the hood, and customize with Matplotlib APIs.

Who this is for

Beginners who know basic pandas and want to compute faster with NumPy.
Analysts who can make basic plots but want to customize them cleanly.
Anyone preparing for analyst interviews involving vectorized operations and plotting.

Prerequisites

Python basics (variables, functions, importing modules).
Pandas basics (Series, DataFrame, indexing, selecting columns).
Very light Matplotlib familiarity (axes, labels) is helpful but not required.

Concept explained simply

Pandas stores tabular data and labels it with an index. NumPy powers fast numeric operations. Matplotlib draws the charts. They interoperate like this:

Use NumPy functions (like np.log, np.where, np.mean) directly on pandas Series/DataFrames. Pandas passes the data to NumPy efficiently.
Convert to NumPy when you need raw arrays using to_numpy(). Convert back to pandas with pd.Series(...) or pd.DataFrame(...) to regain labels.
Use df.plot(...) for quick charts. For fine control, get an Axes from Matplotlib and pass it to pandas: df.plot(ax=ax).

Mental model

Pandas = labeled containers + convenient API.
NumPy = speed engine for number crunching.
Matplotlib = canvas and brushes for drawing.

Move data between them when needed. Keep labels in pandas for alignment and readability; use NumPy for heavy math; draw with Matplotlib using axes.

Quick setup snippet

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Sample DataFrame
df = pd.DataFrame({
    'day': pd.date_range('2023-01-01', periods=10, freq='D'),
    'sales': [12, 15, 13, 20, 18, 17, 30, 28, 22, 25],
    'visits': [120, 130, 125, 150, 160, 158, 200, 195, 180, 190]
}).set_index('day')
print(df.head())

Worked examples

Example 1 — Vectorized transforms with NumPy

Compute z-scores for the sales column and flag unusually high values.

sales = df['sales']
mu = sales.mean()            # pandas Series method
sigma = sales.std(ddof=0)    # population std for demo
z = (sales - mu) / sigma     # vectorized math works directly

# Use NumPy to label outliers (> 1.0 std)
df['is_high'] = np.where(z > 1.0, 1, 0)
print(z.round(2))
print(df['is_high'].value_counts())

Key point: You didn’t have to convert to NumPy. Pandas interoperates with NumPy ufuncs and operators.

Example 2 — Converting to/from NumPy

When another library needs arrays, convert with to_numpy().

X = df[['sales', 'visits']].to_numpy(dtype=float)  # shape (n, 2)
print(X.shape, X.dtype)

# Later, convert results back to pandas while keeping index
y = (X[:, 0] / X[:, 1])  # conversion rate as raw ndarray
s_conversion = pd.Series(y, index=df.index, name='conv_rate')
df = pd.concat([df, s_conversion], axis=1)
print(df.head())

Tip: Prefer to_numpy() over .values because it respects dtypes more consistently.

Example 3 — Plotting with pandas + Matplotlib axes

Create a line chart with two y-axes: pandas handles data; Matplotlib handles layout.

fig, ax1 = plt.subplots(figsize=(7, 4))
ax2 = ax1.twinx()  # second y-axis

# Plot on specific axes for full control
_df1 = df[['sales']]
_df2 = df[['visits']]
_df1.plot(ax=ax1, color='tab:blue', marker='o', legend=False)
_df2.plot(ax=ax2, color='tab:orange', linestyle='--', legend=False)

ax1.set_title('Sales and Visits over Time')
ax1.set_xlabel('Day')
ax1.set_ylabel('Sales', color='tab:blue')
ax2.set_ylabel('Visits', color='tab:orange')
ax1.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Pattern: fig, ax = plt.subplots() → pass ax to df.plot() → customize using Matplotlib.

Common interop operations

Elementwise math: np.log(df['sales']), np.sqrt(df['visits']).
Conditional choice: np.where(cond, a, b) returns a vector; assign to a new column.
Row-wise functions: use vectorized NumPy patterns first; avoid apply(..., axis=1) if not necessary.
Aggregation ignoring NaN: np.nanmean(df['sales'].to_numpy()) or pandas df['sales'].mean().
Broadcasting: shapes must match. For 1D arrays, use length = number of rows.

Learning path

Warm-up: Run the setup snippet and print shapes/dtypes.
Vectorize: Replace any loops with NumPy ufuncs (np.log, np.where, np.clip).
Convert safely: Practice to_numpy() and reconstruct labeled Series/DataFrames.
Plot basics: Use df.plot() then move to Matplotlib axes for fine control.
Polish: Add titles, labels, grids, legends, colors, twin axes.

Common mistakes and self-check

Using .values instead of to_numpy(): May give unexpected dtypes. Self-check: confirm arr.dtype after conversion.
Forgetting index alignment: Pandas aligns by index; NumPy ignores labels. Self-check: after converting to NumPy, verify shapes and ordering before assigning back.
Silent NaN propagation: np.log(0) → -inf, operations with NaN propagate NaN. Self-check: run np.isfinite() on results before plotting.
Plotting on default axes then customizing another axes: Your settings won’t apply. Self-check: always pass ax=... into df.plot() when customizing.

Self-checklist

I can compute a new column using only NumPy vectorized calls.
I can convert between pandas and NumPy without losing row order.
I can pass an existing Matplotlib Axes to df.plot().
I can handle NaN/inf before plotting.

Exercises

These mirror the interactive exercise(s) below. Do them in your Python environment.

Exercise 1 — Vectorize, convert, and plot

Create a DataFrame with two numeric columns and a date index (10–20 rows).
Compute a z-score on one column using NumPy; create a binary flag with np.where.
Convert the selected numeric columns to a NumPy array with to_numpy(dtype=float).
Back in pandas, create a ratio column from the NumPy result; plot a line chart and customize with Matplotlib axes.

Expected: z-scores printed, flag counts, and a customized chart rendered.

Practical projects

Marketing KPI dashboard (static): Build a DataFrame with daily spend, clicks, conversions. Use NumPy for CTR/CVR, clip outliers, and plot multi-axis trends with Matplotlib.
Quality control thresholds: Given measurements, compute rolling z-scores with NumPy, mark out-of-control points, and highlight them on a plot.
Anomaly tags: Use np.where and np.select to assign labels (normal, warn, alert) based on multiple conditions; visualize counts per day.

Mini challenge

Given columns revenue and visits, create:

rev_per_visit = revenue / visits (handle division by zero via np.where or np.divide with where).
tag = 'high', 'med', 'low' using np.select based on quantiles.
A dual-axis plot: rev_per_visit on left, visits on right, styled distinctly.

Next steps

Practice replacing loops with NumPy ufuncs and np.where/np.select.
Adopt a plotting pattern: fig, ax = plt.subplots() → df.plot(ax=ax) → customize.
Try a small project above and keep code snippets for reuse.

Progress & test

The quick test below is available to everyone. If you are logged in, your progress will be saved automatically.

Quick Test

When you are ready, take the quick test below to check your understanding.

Menu

Interop With Numpy and Matplotlib

Table of Contents