Menu

Topic 26 of 31

Data Import and Export Copy Load Unload

Learn Data Import and Export Copy Load Unload for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 19, 2025 | Updated: December 19, 2025

Why this matters

As a Data Analyst, your data rarely starts inside your database in a perfect table. You need to pull in CSVs from marketing, export clean datasets for teammates, and move large files efficiently. Knowing how to import and export with COPY/LOAD/UNLOAD saves hours, reduces errors, and keeps pipelines reproducible.

  • Load weekly CSVs from vendors into staging tables
  • Export filtered results for Finance as a clean CSV
  • Bulk insert millions of rows for faster analytics
  • Unload large query results to cloud storage for sharing

Concept explained simply

Import means bringing rows from a file into a table. Export means writing rows from a table or query to a file. The database needs to know file path, format (delimiter, quote, escape), header presence, null representation, and sometimes credentials (cloud storage).

Mental model

Think of a conveyor belt feeding rows into your table. You set up the belt with rules: where the belt starts (file path), how boxes are separated (delimiter), how labels are protected (quotes/escape), and how to handle missing labels (NULL). Export is the same belt in reverse.

Quick reference by database

PostgreSQL

Server-side import/export (requires server file access):

COPY schema.table (col1, col2)
FROM '/var/load/data.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ',', QUOTE '"', ESCAPE '"');

COPY (SELECT * FROM schema.table)
TO '/var/export/out.csv'
WITH (FORMAT csv, HEADER true);

Client-side via psql (works with your local files):

\copy schema.table (col1, col2)
FROM 'data.csv' WITH (FORMAT csv, HEADER true);

\copy (SELECT col1, col2 FROM schema.table)
TO 'out.csv' WITH (FORMAT csv, HEADER true);
MySQL / MariaDB
LOAD DATA [LOCAL] INFILE 'data.csv'
INTO TABLE db.table
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(col1, col2, @raw_date)
SET date_col = STR_TO_DATE(@raw_date, '%Y-%m-%d');

SELECT col1, col2
INTO OUTFILE '/var/lib/mysql-files/out.csv'
FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'
FROM db.table;

Note: SELECT ... INTO OUTFILE writes on the server; choose a server-writable directory.

SQL Server
BULK INSERT dbo.TableName
FROM 'C:\\data\\file.tsv'
WITH (
  FIELDTERMINATOR='\t',
  ROWTERMINATOR='\n',
  FIRSTROW = 2,
  TABLOCK
);

-- Export with bcp (command line)
-- bcp "SELECT col1, col2 FROM db.dbo.TableName" queryout "out.csv" -c -t, -T -S localhost
Amazon Redshift
COPY schema.table
FROM 's3://your-bucket/path/data/'
IAM_ROLE 'arn:aws:iam::123456789012:role/MyRedshiftRole'
CSV IGNOREHEADER 1 DELIMITER ',' QUOTE '"' GZIP;

UNLOAD ('SELECT * FROM schema.table')
TO 's3://your-bucket/path/exports/out_'
IAM_ROLE 'arn:aws:iam::123456789012:role/MyRedshiftRole'
FORMAT AS CSV HEADER;
Snowflake
-- Stage a file then load
COPY INTO schema.table
FROM @mystage/path/
FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY='"' SKIP_HEADER=1);

-- Export
COPY INTO @mystage/exports/out_
FROM (SELECT * FROM schema.table)
FILE_FORMAT = (TYPE=CSV FIELD_DELIMITER=',' HEADER=TRUE);

Worked examples

Example 1: PostgreSQL — Import a CSV locally with psql and export a subset
-- Table
CREATE TABLE public.customers (
  id INT,
  name TEXT,
  email TEXT,
  signup_date DATE
);

-- customers.csv (local)
-- id,name,email,signup_date
-- 1,Ada Lovelace,ada@example.com,2024-05-01
-- 2,Grace Hopper,grace@example.com,2024-05-03
-- 3,Alan Turing,alan@example.com,2024-06-10

-- Import (client-side)
\copy public.customers FROM 'customers.csv' WITH (FORMAT csv, HEADER true);

-- Quick check
SELECT COUNT(*) FROM public.customers; -- 3

-- Export only June signups
\copy (
  SELECT id, name, email FROM public.customers WHERE signup_date >= '2024-06-01'
) TO 'june_customers.csv' WITH (FORMAT csv, HEADER true);
Example 2: MySQL — LOAD DATA with date parsing and export
CREATE TABLE customers (
  id INT,
  name VARCHAR(100),
  email VARCHAR(255),
  signup_date DATE
);

-- customers_semicolon.csv
-- id;name;email;signup
-- 1;"Ada Lovelace";ada@example.com;01/05/2024
-- 2;"Grace Hopper";grace@example.com;03/05/2024
-- 3;"Alan Turing";alan@example.com;10/06/2024

LOAD DATA LOCAL INFILE 'customers_semicolon.csv'
INTO TABLE customers
FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(id, name, email, @signup)
SET signup_date = STR_TO_DATE(@signup, '%d/%m/%Y');

SELECT COUNT(*) FROM customers; -- 3

-- Export (server-side path must be writable by MySQL)
SELECT id, name, email, signup_date
INTO OUTFILE '/var/lib/mysql-files/customers_out.csv'
FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'
FROM customers;
Example 3: SQL Server — BULK INSERT from TSV
CREATE TABLE dbo.Sales (
  order_id INT,
  sku NVARCHAR(50),
  qty INT,
  price DECIMAL(10,2)
);

-- sales.tsv (tab separated, first row header)
-- order_id	sku	qty	price
-- 101	A-1	2	19.99
-- 102	B-2	1	5.00

BULK INSERT dbo.Sales
FROM 'C:\\data\\sales.tsv'
WITH (
  FIELDTERMINATOR='\t',
  ROWTERMINATOR='\n',
  FIRSTROW=2
);

-- Export via bcp (shell)
-- bcp "SELECT order_id, sku, qty, price FROM YourDb.dbo.Sales" queryout "C:\\data\\sales_out.csv" -c -t, -T -S YourServer
Example 4: Redshift — COPY from S3 and UNLOAD
CREATE TABLE staging.events (
  user_id VARCHAR(50),
  event_time TIMESTAMP,
  event_type VARCHAR(50)
);

COPY staging.events
FROM 's3://company-raw/events/2024/06/'
IAM_ROLE 'arn:aws:iam::123456789012:role/MyRedshiftRole'
CSV IGNOREHEADER 1 TIMEFORMAT 'auto' GZIP;

UNLOAD ('SELECT event_type, COUNT(*) c FROM staging.events GROUP BY 1')
TO 's3://company-

Data preparation checklist

  • Confirm delimiter (comma, semicolon, tab) and text qualifier (quotes)
  • Check header row presence
  • Identify encoding (UTF-8 recommended)
  • Define NULL representation (empty, \N, NULL)
  • Validate data types and date formats
  • Ensure file path permissions (server or local, depending on command)
  • For cloud, verify credentials/role and bucket paths

Step-by-step: reliable import

  1. Create a staging table: mirror columns as text if unsure; transform later.
    CREATE TABLE staging.raw_customers (
      id TEXT, name TEXT, email TEXT, signup TEXT
    );
    
  2. Load file: use COPY/LOAD with the right options (delimiter, header, quote).
  3. Validate: count rows, sample a few, check nulls and extremes.
    SELECT COUNT(*), COUNT(DISTINCT email) FROM staging.raw_customers;
    
  4. Transform into typed table: cast and clean in INSERT...SELECT.
    INSERT INTO dim.customers (id, name, email, signup_date)
    SELECT CAST(id AS INT), name, LOWER(email), CAST(signup AS DATE)
    FROM staging.raw_customers;
    
  5. Export if needed: write to CSV from a SELECT with explicit columns and order.

Common mistakes and how to self-check

  • Mixing server vs client paths: COPY uses server path; psql \copy uses your machine path. Self-check: Does the file exist on the database server or on your laptop?
  • Wrong delimiter/quotes: Commas inside values break rows. Self-check: Open the file and inspect a few lines; adjust DELIMITER and QUOTE/ENCLOSED BY.
  • Header row imported as data: Forgot HEADER/IGNORE 1. Self-check: Look for a row with column names.
  • Windows vs Unix line endings: Use correct ROWTERMINATOR (\r\n on Windows). Self-check: File created on Windows? Choose \r\n.
  • NULL handling: Empty strings vs NULL get confused. Self-check: Use NULL 'NULL' or treat empty as NULL where supported.
  • Permissions: INTO OUTFILE fails without server dir permissions. Self-check: Use a server-writable directory and required privileges.

Security and performance tips

  • Prefer server-side bulk operations for large data; they are faster than client round-trips.
  • Restrict file system paths; avoid arbitrary server paths for security.
  • Validate and sanitize filenames; never concatenate unchecked user input into file paths.
  • Use staging tables and explicit casts to avoid silent truncation.
  • For very large loads, disable/enable indexes appropriately (where safe) and use batch sizes.
  • Compress files (e.g., GZIP) when supported; it reduces network IO.

Exercises

These mirror the tasks in the Exercises section below. You can complete them conceptually or run them if you have the respective database locally.

Exercise 1 (PostgreSQL): Import and export customers

Create table:

CREATE TABLE public.customers (
  id INT,
  name TEXT,
  email TEXT,
  signup_date DATE
);

File customers.csv:

id,name,email,signup_date
1,Ada Lovelace,ada@example.com,2024-05-01
2,Grace Hopper,grace@example.com,2024-05-03
3,Alan Turing,alan@example.com,2024-06-10
4,Barbara Liskov,barbara@example.com,2024-06-15
  1. Import with psql \copy using FORMAT csv and HEADER.
  2. Verify row count (expect 4).
  3. Export only June signups to june_customers.csv with a header.
  • Checklist: Used \copy not COPY for local file? Correct delimiter and header? Filtered by signup_date \u2265 '2024-06-01'?
Exercise 2 (MySQL): Clean load with semicolons and date conversion

Create table:

CREATE TABLE customers2 (
  id INT,
  name VARCHAR(100),
  email VARCHAR(255),
  signup_date DATE
);

File customers_semicolon.csv:

id;name;email;signup
1;"Ada Lovelace";ada@example.com;01/05/2024
2;"Grace Hopper";grace@example.com;03/05/2024
3;"Alan Turing";alan@example.com;10/06/2024
  1. LOAD DATA LOCAL INFILE with FIELDS TERMINATED BY ';', ENCLOSED BY '"', IGNORE 1 LINES.
  2. Convert signup dd/mm/yyyy to DATE using STR_TO_DATE.
  3. Export to customers2_out.csv using SELECT ... INTO OUTFILE with comma delimiter and header emulation (by UNION ALL for header or output without header if not required).
  • Checklist: Correct delimiter? Date parsed? Server-writable OUTFILE directory?

Practical projects

  • Vendor pipeline: Build a repeatable script to load weekly vendor CSVs into a staging schema, transform to typed tables, and export a weekly summary report.
  • Marketing attribution export: Export campaign performance by channel and date to a CSV consumed by a dashboard tool.
  • Cloud unload: In Redshift or Snowflake, UNLOAD a 3-month rolling dataset to cloud storage with a timestamped file name.

Who this is for

  • Data Analysts who need to move data between files and databases
  • Analytics Engineers building lightweight pipelines
  • Anyone supporting CSV/TSV data handoffs across teams

Prerequisites

  • Basic SQL (SELECT, INSERT, data types)
  • Comfort with command-line tools (psql, mysql, bcp) is helpful
  • Access to a local or sandbox database (optional for hands-on)

Learning path

  • Start: Learn delimiters, text qualifiers, and header handling
  • Practice: Do small imports with staging tables and validations
  • Advance: Handle large files, compression, and server-side exports
  • Capstone: Create an end-to-end import-transform-export workflow

Common mistakes recap

  • Using server-only commands with local paths (and vice versa)
  • Forgetting HEADER/IGNORE 1, importing column names as data
  • Mishandling Windows line endings
  • Not validating NULLs and empty strings

Mini challenge

You receive a 2GB CSV with semicolons, quoted text, and a header. Dates are dd-mm-yyyy. Describe the exact command (for your DB) and options you would use to load into a staging table, verify counts, and export only the last 30 days to a new CSV. Include delimiter, quote, escape, header, and date parsing choices.

Next steps

  • Automate imports on a schedule (cron, orchestrators) using your chosen DB tooling
  • Add data quality checks (row counts, null thresholds, duplicates)
  • Document your import/export conventions for your team

Quick Test

Take the quick test below to check your understanding. Available to everyone; logged-in users get saved progress.

Practice Exercises

2 exercises to complete

Instructions

Create the table, load data from a local CSV using psql, then export a filtered subset.

-- Table
CREATE TABLE public.customers (
  id INT,
  name TEXT,
  email TEXT,
  signup_date DATE
);

-- File: customers.csv
id,name,email,signup_date
1,Ada Lovelace,ada@example.com,2024-05-01
2,Grace Hopper,grace@example.com,2024-05-03
3,Alan Turing,alan@example.com,2024-06-10
4,Barbara Liskov,barbara@example.com,2024-06-15
  • Task A: Import customers.csv using \copy with FORMAT csv, HEADER true.
  • Task B: Validate with SELECT COUNT(*) (expect 4).
  • Task C: Export only June signups (signup_date \u2265 '2024-06-01') to june_customers.csv with a header.
Expected Output
After import: COUNT(*) = 4. Exported file june_customers.csv contains 2 rows (ids 3 and 4) plus header.

Data Import and Export Copy Load Unload — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Data Import and Export Copy Load Unload?

AI Assistant

Ask questions about this tool