luvv to helpDiscover the Best Free Online Tools
Topic 4 of 30

Understanding DataFrame and Series Basics

Learn Understanding DataFrame and Series Basics for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Why this matters

As a Data Analyst, you will spend a lot of time loading, cleaning, and exploring data before any modeling or visualization. Pandas provides two essential building blocks for this: Series (one-dimensional labeled data) and DataFrame (two-dimensional labeled tabular data). Mastering their basics lets you quickly inspect data quality, select subsets, compute new columns, and prepare reports.

  • Real tasks you will do: quickly preview incoming CSVs, pick specific rows/columns, compute summary metrics, and fix simple data issues (missing values, types).
  • Outcome: you can confidently read data, understand its shape and labels, and perform first-pass checks in seconds.

Who this is for

  • New or aspiring Data Analysts starting with pandas.
  • Analysts coming from Excel who want to understand pandas tables.
  • Anyone who needs reliable basics for further analysis and visualization.

Prerequisites

  • Python basics: variables, lists/dicts, functions, and importing libraries.
  • Ability to run code in a notebook (e.g., Jupyter) or a Python script.

Concept explained simply

Think of a DataFrame as a spreadsheet with row labels (index) and column labels (columns). Each column is a Series. A Series is like a labeled column of values.

Mental model: DataFrame = a dictionary of Series aligned on the same index. If you select one column from a DataFrame, you get a Series. If you select multiple columns, you get a smaller DataFrame.
Key terms, quickly
  • Series: 1D labeled array (values + index).
  • DataFrame: 2D table (columns + index). Each column is a Series.
  • Index: row labels. Default is 0..N-1; you can set your own.
  • Columns: column labels; typically strings.
  • Shape: (rows, columns).
  • Dtypes: data types of each column.

Core objects: Series and DataFrame

import pandas as pd

# Series examples
s1 = pd.Series([10, 20, 30])                 # default index: 0,1,2
s2 = pd.Series([10, 20, 30], index=['a','b','c'])

# DataFrame examples
df1 = pd.DataFrame({'city': ['NY', 'LA', 'SF'], 'sales': [100, 120, 90]})
# Each column is a Series
df1['sales']              # Series
Inspecting structure
df1.shape      # (3, 2)
df1.index      # RangeIndex(start=0, stop=3, step=1)
df1.columns    # Index(['city', 'sales'], dtype='object')
df1.dtypes     # city: object, sales: int64

# Quick look
df1.head(2)
df1.tail(2)
df1.info()     # non-null counts and dtypes

Creating Series and DataFrames

From lists/dicts
pd.Series([1,2,3])
pd.DataFrame({'A':[1,2], 'B':[3,4]})  # lists must be equal length
From list of dicts (rows)
rows = [
    {'city':'NY', 'sales':100},
    {'city':'LA', 'sales':120}
]
pd.DataFrame(rows)
Set index
df = pd.DataFrame({'id':[101,102], 'name':['A','B']})
df = df.set_index('id')   # now rows labeled 101, 102

Selecting data (loc vs iloc)

  • loc: label-based selection (uses index/column names).
  • iloc: position-based selection (uses integer positions).
df = pd.DataFrame({
    'city':['NY','LA','SF','SEA'],
    'sales':[100,120,90,110]
}, index=['n1','l2','s3','s4'])

# loc - by labels
df.loc['s3', 'sales']            # 90
df.loc[['n1','s4'], ['city']]    # rows n1,s4 and the city column

# iloc - by positions
df.iloc[2, 1]                    # 90 (3rd row, 2nd column)
df.iloc[0:2, 0:1]                # first 2 rows, first column

Worked examples

Example 1: Preview new data quickly
import pandas as pd

orders = pd.DataFrame({
    'order_id':[1,2,3,4],
    'country':['US','US','UK','DE'],
    'price':[12.5, 8.0, 15.0, 7.5]
})

print(orders.shape)     # (4, 3)
print(orders.columns)   # Index(['order_id','country','price'], dtype='object')
print(orders.head(2))

Why it helps: in one glance, you know size, fields, and example rows for sanity checks.

Example 2: Select a column and compute a derived one
orders['price_with_tax'] = orders['price'] * 1.2
avg_price = orders['price'].mean()

Result: a new column (Series arithmetic is vectorized) and a quick metric.

Example 3: Label vs position selection
# With a named index, loc aligns to labels
orders_idx = orders.set_index('order_id')
price_3 = orders_idx.loc[3, 'price']  # label 3

# Position-based
third_row_second_col = orders_idx.iloc[2, 1]

Use loc when labels matter (safer, clearer), iloc for positional slicing.

Practice: Your turn

Complete the exercises below. The Quick Test at the end is available to everyone; only logged-in users will see saved progress when they return.

  • Exercise 1: Create Series/DataFrame and inspect shape, index, columns, and dtypes.
  • Exercise 2: Practice loc/iloc selection and simple filtering.
Exercise 1 — instructions

Mirror of Exercise 1 in the Exercises panel below.

  1. Create a Series for daily visitors: [120, 135, 90] with index ['Mon','Tue','Wed'].
  2. Create a DataFrame with columns 'city' and 'sales': cities ['NY','LA','SF','SEA'] and sales [100,120,90,110].
  3. Print shape, index, columns, and dtypes for the DataFrame. Print the Series' index and the value for 'Tue'.

Expected: shape (4,2); correct index/columns; 'Tue' value is 135.

Exercise 2 — instructions

Mirror of Exercise 2 in the Exercises panel below.

  1. Using the DataFrame from Exercise 1, set the index to 'city'.
  2. With loc, select the 'sales' value for 'SF'.
  3. With iloc, select the same value by position.
  4. Filter rows where sales >= 110 and show only the 'sales' column.

Expected: 'SF' is 90; filtered rows should include 'LA' (120) and 'SEA' (110).

Common mistakes and self-check

  • Confusing loc and iloc: loc uses labels; iloc uses positions. If your index is not 0..N-1, using iloc with label numbers will return wrong rows.
  • Expecting unequal list lengths to work: DataFrame from dict-of-lists requires equal lengths.
  • Forgetting the index after set_index: After df.set_index('col'), rows are labeled by that column; refer to labels, not old row numbers.
  • Misreading dtypes: Objects may be text; numbers stored as text will not sum until converted. Self-check with df.dtypes and df.info().
Self-check tips
  • Print df.shape before/after operations to ensure row/column counts are as expected.
  • Print df.index and df.columns to verify labels align with your selection method.
  • Use df.head() after creating new columns to confirm values look correct.

Practical projects (small)

  • Retail snapshot: Load a small CSV-like dict into a DataFrame, compute total revenue (price * qty), and show top 3 rows.
  • Temperature log: Create a Series with day labels and temperatures; compute mean, min, and day of max temperature (idxmax()).
  • Mini catalog: Build a DataFrame with id, name, category; set index to id; practice loc selections by id.

Learning path

  1. This subskill: DataFrame and Series basics (you are here).
  2. Data loading and saving: read_csv, read_excel, to_csv.
  3. Selection and filtering deep dive: boolean masks, query, isin.
  4. Data cleaning: handling missing values, type conversion, renaming.
  5. Aggregation: groupby, pivot tables, descriptive stats.

Mini challenge

Create a DataFrame with columns: product ['A','B','C','D'], price [10,15,7,12], qty [3,1,5,2].

  • Add a new column 'revenue' = price * qty.
  • Set index to 'product'.
  • Using loc, get revenue for 'C'. Using iloc, grab the first two rows and 'price' column.
  • What's the shape and dtypes?
Peek solution
import pandas as pd

df = pd.DataFrame({
    'product':['A','B','C','D'],
    'price':[10,15,7,12],
    'qty':[3,1,5,2]
})
df['revenue'] = df['price'] * df['qty']
df = df.set_index('product')
rev_c = df.loc['C','revenue']
first_two_prices = df.iloc[0:2, df.columns.get_loc('price')]
print(df.shape)
print(df.dtypes)

Next steps

  • Repeat the exercises with a different small dataset to build fluency.
  • Move on to selection/filtering patterns with boolean masks and conditions.
  • Start using head(), info(), dtypes automatically whenever you load new data.

Practice Exercises

2 exercises to complete

Instructions

1) Create a Series named visitors with values [120, 135, 90] and index ['Mon','Tue','Wed'].

2) Create a DataFrame named sales with columns 'city'=['NY','LA','SF','SEA'] and 'sales'=[100,120,90,110].

3) Print sales.shape, sales.index, sales.columns, and sales.dtypes.

4) Print the visitors Series' index and the value for 'Tue'.

Expected Output
sales.shape -> (4, 2); sales.index -> RangeIndex(0, 4); sales.columns -> Index(['city','sales']); sales.dtypes shows 'city' object, 'sales' int64 (or Int64); visitors['Tue'] -> 135

Have questions about Understanding DataFrame and Series Basics?

AI Assistant

Ask questions about this tool