How to learn Data Import and Export Copy Load Unload for SQL in Data Analyst for free

Why this matters

As a Data Analyst, your data rarely starts inside your database in a perfect table. You need to pull in CSVs from marketing, export clean datasets for teammates, and move large files efficiently. Knowing how to import and export with COPY/LOAD/UNLOAD saves hours, reduces errors, and keeps pipelines reproducible.

Load weekly CSVs from vendors into staging tables
Export filtered results for Finance as a clean CSV
Bulk insert millions of rows for faster analytics
Unload large query results to cloud storage for sharing

Concept explained simply

Import means bringing rows from a file into a table. Export means writing rows from a table or query to a file. The database needs to know file path, format (delimiter, quote, escape), header presence, null representation, and sometimes credentials (cloud storage).

Mental model

Think of a conveyor belt feeding rows into your table. You set up the belt with rules: where the belt starts (file path), how boxes are separated (delimiter), how labels are protected (quotes/escape), and how to handle missing labels (NULL). Export is the same belt in reverse.

Quick reference by database

PostgreSQL

Server-side import/export (requires server file access):

COPY schema.table (col1, col2)
FROM '/var/load/data.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ',', QUOTE '"', ESCAPE '"');

COPY (SELECT * FROM schema.table)
TO '/var/export/out.csv'
WITH (FORMAT csv, HEADER true);

Client-side via psql (works with your local files):

\copy schema.table (col1, col2)
FROM 'data.csv' WITH (FORMAT csv, HEADER true);

\copy (SELECT col1, col2 FROM schema.table)
TO 'out.csv' WITH (FORMAT csv, HEADER true);

MySQL / MariaDB

LOAD DATA [LOCAL] INFILE 'data.csv'
INTO TABLE db.table
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(col1, col2, @raw_date)
SET date_col = STR_TO_DATE(@raw_date, '%Y-%m-%d');

SELECT col1, col2
INTO OUTFILE '/var/lib/mysql-files/out.csv'
FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'
FROM db.table;

Note: SELECT ... INTO OUTFILE writes on the server; choose a server-writable directory.

SQL Server

BULK INSERT dbo.TableName
FROM 'C:\\data\\file.tsv'
WITH (
  FIELDTERMINATOR='\t',
  ROWTERMINATOR='\n',
  FIRSTROW = 2,
  TABLOCK
);

-- Export with bcp (command line)
-- bcp "SELECT col1, col2 FROM db.dbo.TableName" queryout "out.csv" -c -t, -T -S localhost

Amazon Redshift

COPY schema.table
FROM 's3://your-bucket/path/data/'
IAM_ROLE 'arn:aws:iam::123456789012:role/MyRedshiftRole'
CSV IGNOREHEADER 1 DELIMITER ',' QUOTE '"' GZIP;

UNLOAD ('SELECT * FROM schema.table')
TO 's3://your-bucket/path/exports/out_'
IAM_ROLE 'arn:aws:iam::123456789012:role/MyRedshiftRole'
FORMAT AS CSV HEADER;

Snowflake

-- Stage a file then load
COPY INTO schema.table
FROM @mystage/path/
FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY='"' SKIP_HEADER=1);

-- Export
COPY INTO @mystage/exports/out_
FROM (SELECT * FROM schema.table)
FILE_FORMAT = (TYPE=CSV FIELD_DELIMITER=',' HEADER=TRUE);

Worked examples

Example 1: PostgreSQL — Import a CSV locally with psql and export a subset

-- Table
CREATE TABLE public.customers (
  id INT,
  name TEXT,
  email TEXT,
  signup_date DATE
);

-- customers.csv (local)
-- id,name,email,signup_date
-- 1,Ada Lovelace,ada@example.com,2024-05-01
-- 2,Grace Hopper,grace@example.com,2024-05-03
-- 3,Alan Turing,alan@example.com,2024-06-10

-- Import (client-side)
\copy public.customers FROM 'customers.csv' WITH (FORMAT csv, HEADER true);

-- Quick check
SELECT COUNT(*) FROM public.customers; -- 3

-- Export only June signups
\copy (
  SELECT id, name, email FROM public.customers WHERE signup_date >= '2024-06-01'
) TO 'june_customers.csv' WITH (FORMAT csv, HEADER true);

Example 2: MySQL — LOAD DATA with date parsing and export

CREATE TABLE customers (
  id INT,
  name VARCHAR(100),
  email VARCHAR(255),
  signup_date DATE
);

-- customers_semicolon.csv
-- id;name;email;signup
-- 1;"Ada Lovelace";ada@example.com;01/05/2024
-- 2;"Grace Hopper";grace@example.com;03/05/2024
-- 3;"Alan Turing";alan@example.com;10/06/2024

LOAD DATA LOCAL INFILE 'customers_semicolon.csv'
INTO TABLE customers
FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(id, name, email, @signup)
SET signup_date = STR_TO_DATE(@signup, '%d/%m/%Y');

SELECT COUNT(*) FROM customers; -- 3

-- Export (server-side path must be writable by MySQL)
SELECT id, name, email, signup_date
INTO OUTFILE '/var/lib/mysql-files/customers_out.csv'
FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'
FROM customers;

Example 3: SQL Server — BULK INSERT from TSV

CREATE TABLE dbo.Sales (
  order_id INT,
  sku NVARCHAR(50),
  qty INT,
  price DECIMAL(10,2)
);

-- sales.tsv (tab separated, first row header)
-- order_id	sku	qty	price
-- 101	A-1	2	19.99
-- 102	B-2	1	5.00

BULK INSERT dbo.Sales
FROM 'C:\\data\\sales.tsv'
WITH (
  FIELDTERMINATOR='\t',
  ROWTERMINATOR='\n',
  FIRSTROW=2
);

-- Export via bcp (shell)
-- bcp "SELECT order_id, sku, qty, price FROM YourDb.dbo.Sales" queryout "C:\\data\\sales_out.csv" -c -t, -T -S YourServer

Example 4: Redshift — COPY from S3 and UNLOAD

CREATE TABLE staging.events (
  user_id VARCHAR(50),
  event_time TIMESTAMP,
  event_type VARCHAR(50)
);

COPY staging.events
FROM 's3://company-raw/events/2024/06/'
IAM_ROLE 'arn:aws:iam::123456789012:role/MyRedshiftRole'
CSV IGNOREHEADER 1 TIMEFORMAT 'auto' GZIP;

UNLOAD ('SELECT event_type, COUNT(*) c FROM staging.events GROUP BY 1')
TO 's3://company-

Data preparation checklist

Confirm delimiter (comma, semicolon, tab) and text qualifier (quotes)
Check header row presence
Identify encoding (UTF-8 recommended)
Define NULL representation (empty, \N, NULL)
Validate data types and date formats
Ensure file path permissions (server or local, depending on command)
For cloud, verify credentials/role and bucket paths

Step-by-step: reliable import

Create a staging table: mirror columns as text if unsure; transform later.

CREATE TABLE staging.raw_customers (
  id TEXT, name TEXT, email TEXT, signup TEXT
);

Load file: use COPY/LOAD with the right options (delimiter, header, quote).

Validate: count rows, sample a few, check nulls and extremes.

SELECT COUNT(*), COUNT(DISTINCT email) FROM staging.raw_customers;

Transform into typed table: cast and clean in INSERT...SELECT.

INSERT INTO dim.customers (id, name, email, signup_date)
SELECT CAST(id AS INT), name, LOWER(email), CAST(signup AS DATE)
FROM staging.raw_customers;

Export if needed: write to CSV from a SELECT with explicit columns and order.

Common mistakes and how to self-check

Mixing server vs client paths: COPY uses server path; psql \copy uses your machine path. Self-check: Does the file exist on the database server or on your laptop?
Wrong delimiter/quotes: Commas inside values break rows. Self-check: Open the file and inspect a few lines; adjust DELIMITER and QUOTE/ENCLOSED BY.
Header row imported as data: Forgot HEADER/IGNORE 1. Self-check: Look for a row with column names.
Windows vs Unix line endings: Use correct ROWTERMINATOR (\r\n on Windows). Self-check: File created on Windows? Choose \r\n.
NULL handling: Empty strings vs NULL get confused. Self-check: Use NULL 'NULL' or treat empty as NULL where supported.
Permissions: INTO OUTFILE fails without server dir permissions. Self-check: Use a server-writable directory and required privileges.

Security and performance tips

Prefer server-side bulk operations for large data; they are faster than client round-trips.
Restrict file system paths; avoid arbitrary server paths for security.
Validate and sanitize filenames; never concatenate unchecked user input into file paths.
Use staging tables and explicit casts to avoid silent truncation.
For very large loads, disable/enable indexes appropriately (where safe) and use batch sizes.
Compress files (e.g., GZIP) when supported; it reduces network IO.

Exercises

These mirror the tasks in the Exercises section below. You can complete them conceptually or run them if you have the respective database locally.

Exercise 1 (PostgreSQL): Import and export customers

Create table:

CREATE TABLE public.customers (
  id INT,
  name TEXT,
  email TEXT,
  signup_date DATE
);

File customers.csv:

id,name,email,signup_date
1,Ada Lovelace,ada@example.com,2024-05-01
2,Grace Hopper,grace@example.com,2024-05-03
3,Alan Turing,alan@example.com,2024-06-10
4,Barbara Liskov,barbara@example.com,2024-06-15

Import with psql \copy using FORMAT csv and HEADER.
Verify row count (expect 4).
Export only June signups to june_customers.csv with a header.

Checklist: Used \copy not COPY for local file? Correct delimiter and header? Filtered by signup_date \u2265 '2024-06-01'?

Exercise 2 (MySQL): Clean load with semicolons and date conversion

Create table:

CREATE TABLE customers2 (
  id INT,
  name VARCHAR(100),
  email VARCHAR(255),
  signup_date DATE
);

File customers_semicolon.csv:

id;name;email;signup
1;"Ada Lovelace";ada@example.com;01/05/2024
2;"Grace Hopper";grace@example.com;03/05/2024
3;"Alan Turing";alan@example.com;10/06/2024

LOAD DATA LOCAL INFILE with FIELDS TERMINATED BY ';', ENCLOSED BY '"', IGNORE 1 LINES.
Convert signup dd/mm/yyyy to DATE using STR_TO_DATE.
Export to customers2_out.csv using SELECT ... INTO OUTFILE with comma delimiter and header emulation (by UNION ALL for header or output without header if not required).

Checklist: Correct delimiter? Date parsed? Server-writable OUTFILE directory?

Practical projects

Vendor pipeline: Build a repeatable script to load weekly vendor CSVs into a staging schema, transform to typed tables, and export a weekly summary report.
Marketing attribution export: Export campaign performance by channel and date to a CSV consumed by a dashboard tool.
Cloud unload: In Redshift or Snowflake, UNLOAD a 3-month rolling dataset to cloud storage with a timestamped file name.

Who this is for

Data Analysts who need to move data between files and databases
Analytics Engineers building lightweight pipelines
Anyone supporting CSV/TSV data handoffs across teams

Prerequisites

Basic SQL (SELECT, INSERT, data types)
Comfort with command-line tools (psql, mysql, bcp) is helpful
Access to a local or sandbox database (optional for hands-on)

Learning path

Start: Learn delimiters, text qualifiers, and header handling
Practice: Do small imports with staging tables and validations
Advance: Handle large files, compression, and server-side exports
Capstone: Create an end-to-end import-transform-export workflow

Common mistakes recap

Using server-only commands with local paths (and vice versa)
Forgetting HEADER/IGNORE 1, importing column names as data
Mishandling Windows line endings
Not validating NULLs and empty strings

Mini challenge

You receive a 2GB CSV with semicolons, quoted text, and a header. Dates are dd-mm-yyyy. Describe the exact command (for your DB) and options you would use to load into a staging table, verify counts, and export only the last 30 days to a new CSV. Include delimiter, quote, escape, header, and date parsing choices.

Next steps

Automate imports on a schedule (cron, orchestrators) using your chosen DB tooling
Add data quality checks (row counts, null thresholds, duplicates)
Document your import/export conventions for your team

Quick Test

Take the quick test below to check your understanding. Available to everyone; logged-in users get saved progress.

Menu

Data Import and Export Copy Load Unload

Table of Contents

Why this matters

Concept explained simply

Mental model

Quick reference by database

Worked examples

Data preparation checklist

Step-by-step: reliable import

Common mistakes and how to self-check

Security and performance tips

Exercises

Practical projects

Who this is for

Prerequisites

Learning path

Common mistakes recap

Mini challenge

Next steps

Quick Test

Practice Exercises

PostgreSQL: Import and export customers with \copy

Instructions

Expected Output

MySQL: LOAD DATA with semicolons and date conversion

Data Import and Export Copy Load Unload — Quick Test

Have questions about Data Import and Export Copy Load Unload?

AI Assistant