Why this matters
As a Data Analyst, your data rarely starts inside your database in a perfect table. You need to pull in CSVs from marketing, export clean datasets for teammates, and move large files efficiently. Knowing how to import and export with COPY/LOAD/UNLOAD saves hours, reduces errors, and keeps pipelines reproducible.
- Load weekly CSVs from vendors into staging tables
- Export filtered results for Finance as a clean CSV
- Bulk insert millions of rows for faster analytics
- Unload large query results to cloud storage for sharing
Concept explained simply
Import means bringing rows from a file into a table. Export means writing rows from a table or query to a file. The database needs to know file path, format (delimiter, quote, escape), header presence, null representation, and sometimes credentials (cloud storage).
Mental model
Think of a conveyor belt feeding rows into your table. You set up the belt with rules: where the belt starts (file path), how boxes are separated (delimiter), how labels are protected (quotes/escape), and how to handle missing labels (NULL). Export is the same belt in reverse.
Quick reference by database
PostgreSQL
Server-side import/export (requires server file access):
COPY schema.table (col1, col2)
FROM '/var/load/data.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ',', QUOTE '"', ESCAPE '"');
COPY (SELECT * FROM schema.table)
TO '/var/export/out.csv'
WITH (FORMAT csv, HEADER true);
Client-side via psql (works with your local files):
\copy schema.table (col1, col2)
FROM 'data.csv' WITH (FORMAT csv, HEADER true);
\copy (SELECT col1, col2 FROM schema.table)
TO 'out.csv' WITH (FORMAT csv, HEADER true);
MySQL / MariaDB
LOAD DATA [LOCAL] INFILE 'data.csv'
INTO TABLE db.table
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(col1, col2, @raw_date)
SET date_col = STR_TO_DATE(@raw_date, '%Y-%m-%d');
SELECT col1, col2
INTO OUTFILE '/var/lib/mysql-files/out.csv'
FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'
FROM db.table;
Note: SELECT ... INTO OUTFILE writes on the server; choose a server-writable directory.
SQL Server
BULK INSERT dbo.TableName
FROM 'C:\\data\\file.tsv'
WITH (
FIELDTERMINATOR='\t',
ROWTERMINATOR='\n',
FIRSTROW = 2,
TABLOCK
);
-- Export with bcp (command line)
-- bcp "SELECT col1, col2 FROM db.dbo.TableName" queryout "out.csv" -c -t, -T -S localhost
Amazon Redshift
COPY schema.table
FROM 's3://your-bucket/path/data/'
IAM_ROLE 'arn:aws:iam::123456789012:role/MyRedshiftRole'
CSV IGNOREHEADER 1 DELIMITER ',' QUOTE '"' GZIP;
UNLOAD ('SELECT * FROM schema.table')
TO 's3://your-bucket/path/exports/out_'
IAM_ROLE 'arn:aws:iam::123456789012:role/MyRedshiftRole'
FORMAT AS CSV HEADER;
Snowflake
-- Stage a file then load
COPY INTO schema.table
FROM @mystage/path/
FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = ',' FIELD_OPTIONALLY_ENCLOSED_BY='"' SKIP_HEADER=1);
-- Export
COPY INTO @mystage/exports/out_
FROM (SELECT * FROM schema.table)
FILE_FORMAT = (TYPE=CSV FIELD_DELIMITER=',' HEADER=TRUE);
Worked examples
Example 1: PostgreSQL — Import a CSV locally with psql and export a subset
-- Table
CREATE TABLE public.customers (
id INT,
name TEXT,
email TEXT,
signup_date DATE
);
-- customers.csv (local)
-- id,name,email,signup_date
-- 1,Ada Lovelace,ada@example.com,2024-05-01
-- 2,Grace Hopper,grace@example.com,2024-05-03
-- 3,Alan Turing,alan@example.com,2024-06-10
-- Import (client-side)
\copy public.customers FROM 'customers.csv' WITH (FORMAT csv, HEADER true);
-- Quick check
SELECT COUNT(*) FROM public.customers; -- 3
-- Export only June signups
\copy (
SELECT id, name, email FROM public.customers WHERE signup_date >= '2024-06-01'
) TO 'june_customers.csv' WITH (FORMAT csv, HEADER true);
Example 2: MySQL — LOAD DATA with date parsing and export
CREATE TABLE customers (
id INT,
name VARCHAR(100),
email VARCHAR(255),
signup_date DATE
);
-- customers_semicolon.csv
-- id;name;email;signup
-- 1;"Ada Lovelace";ada@example.com;01/05/2024
-- 2;"Grace Hopper";grace@example.com;03/05/2024
-- 3;"Alan Turing";alan@example.com;10/06/2024
LOAD DATA LOCAL INFILE 'customers_semicolon.csv'
INTO TABLE customers
FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(id, name, email, @signup)
SET signup_date = STR_TO_DATE(@signup, '%d/%m/%Y');
SELECT COUNT(*) FROM customers; -- 3
-- Export (server-side path must be writable by MySQL)
SELECT id, name, email, signup_date
INTO OUTFILE '/var/lib/mysql-files/customers_out.csv'
FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'
FROM customers;
Example 3: SQL Server — BULK INSERT from TSV
CREATE TABLE dbo.Sales (
order_id INT,
sku NVARCHAR(50),
qty INT,
price DECIMAL(10,2)
);
-- sales.tsv (tab separated, first row header)
-- order_id sku qty price
-- 101 A-1 2 19.99
-- 102 B-2 1 5.00
BULK INSERT dbo.Sales
FROM 'C:\\data\\sales.tsv'
WITH (
FIELDTERMINATOR='\t',
ROWTERMINATOR='\n',
FIRSTROW=2
);
-- Export via bcp (shell)
-- bcp "SELECT order_id, sku, qty, price FROM YourDb.dbo.Sales" queryout "C:\\data\\sales_out.csv" -c -t, -T -S YourServer
Example 4: Redshift — COPY from S3 and UNLOAD
CREATE TABLE staging.events (
user_id VARCHAR(50),
event_time TIMESTAMP,
event_type VARCHAR(50)
);
COPY staging.events
FROM 's3://company-raw/events/2024/06/'
IAM_ROLE 'arn:aws:iam::123456789012:role/MyRedshiftRole'
CSV IGNOREHEADER 1 TIMEFORMAT 'auto' GZIP;
UNLOAD ('SELECT event_type, COUNT(*) c FROM staging.events GROUP BY 1')
TO 's3://company-Data preparation checklist
- Confirm delimiter (comma, semicolon, tab) and text qualifier (quotes)
- Check header row presence
- Identify encoding (UTF-8 recommended)
- Define NULL representation (empty, \N, NULL)
- Validate data types and date formats
- Ensure file path permissions (server or local, depending on command)
- For cloud, verify credentials/role and bucket paths
Step-by-step: reliable import
- Create a staging table: mirror columns as text if unsure; transform later.
CREATE TABLE staging.raw_customers ( id TEXT, name TEXT, email TEXT, signup TEXT ); - Load file: use COPY/LOAD with the right options (delimiter, header, quote).
- Validate: count rows, sample a few, check nulls and extremes.
SELECT COUNT(*), COUNT(DISTINCT email) FROM staging.raw_customers; - Transform into typed table: cast and clean in INSERT...SELECT.
INSERT INTO dim.customers (id, name, email, signup_date) SELECT CAST(id AS INT), name, LOWER(email), CAST(signup AS DATE) FROM staging.raw_customers; - Export if needed: write to CSV from a SELECT with explicit columns and order.
Common mistakes and how to self-check
- Mixing server vs client paths: COPY uses server path; psql \copy uses your machine path. Self-check: Does the file exist on the database server or on your laptop?
- Wrong delimiter/quotes: Commas inside values break rows. Self-check: Open the file and inspect a few lines; adjust DELIMITER and QUOTE/ENCLOSED BY.
- Header row imported as data: Forgot HEADER/IGNORE 1. Self-check: Look for a row with column names.
- Windows vs Unix line endings: Use correct ROWTERMINATOR (\r\n on Windows). Self-check: File created on Windows? Choose \r\n.
- NULL handling: Empty strings vs NULL get confused. Self-check: Use NULL 'NULL' or treat empty as NULL where supported.
- Permissions: INTO OUTFILE fails without server dir permissions. Self-check: Use a server-writable directory and required privileges.
Security and performance tips
- Prefer server-side bulk operations for large data; they are faster than client round-trips.
- Restrict file system paths; avoid arbitrary server paths for security.
- Validate and sanitize filenames; never concatenate unchecked user input into file paths.
- Use staging tables and explicit casts to avoid silent truncation.
- For very large loads, disable/enable indexes appropriately (where safe) and use batch sizes.
- Compress files (e.g., GZIP) when supported; it reduces network IO.
Exercises
These mirror the tasks in the Exercises section below. You can complete them conceptually or run them if you have the respective database locally.
Exercise 1 (PostgreSQL): Import and export customers
Create table:
CREATE TABLE public.customers (
id INT,
name TEXT,
email TEXT,
signup_date DATE
);
File customers.csv:
id,name,email,signup_date
1,Ada Lovelace,ada@example.com,2024-05-01
2,Grace Hopper,grace@example.com,2024-05-03
3,Alan Turing,alan@example.com,2024-06-10
4,Barbara Liskov,barbara@example.com,2024-06-15
- Import with psql \copy using FORMAT csv and HEADER.
- Verify row count (expect 4).
- Export only June signups to june_customers.csv with a header.
- Checklist: Used \copy not COPY for local file? Correct delimiter and header? Filtered by signup_date \u2265 '2024-06-01'?
Exercise 2 (MySQL): Clean load with semicolons and date conversion
Create table:
CREATE TABLE customers2 (
id INT,
name VARCHAR(100),
email VARCHAR(255),
signup_date DATE
);
File customers_semicolon.csv:
id;name;email;signup
1;"Ada Lovelace";ada@example.com;01/05/2024
2;"Grace Hopper";grace@example.com;03/05/2024
3;"Alan Turing";alan@example.com;10/06/2024
- LOAD DATA LOCAL INFILE with FIELDS TERMINATED BY ';', ENCLOSED BY '"', IGNORE 1 LINES.
- Convert signup dd/mm/yyyy to DATE using STR_TO_DATE.
- Export to customers2_out.csv using SELECT ... INTO OUTFILE with comma delimiter and header emulation (by UNION ALL for header or output without header if not required).
- Checklist: Correct delimiter? Date parsed? Server-writable OUTFILE directory?
Practical projects
- Vendor pipeline: Build a repeatable script to load weekly vendor CSVs into a staging schema, transform to typed tables, and export a weekly summary report.
- Marketing attribution export: Export campaign performance by channel and date to a CSV consumed by a dashboard tool.
- Cloud unload: In Redshift or Snowflake, UNLOAD a 3-month rolling dataset to cloud storage with a timestamped file name.
Who this is for
- Data Analysts who need to move data between files and databases
- Analytics Engineers building lightweight pipelines
- Anyone supporting CSV/TSV data handoffs across teams
Prerequisites
- Basic SQL (SELECT, INSERT, data types)
- Comfort with command-line tools (psql, mysql, bcp) is helpful
- Access to a local or sandbox database (optional for hands-on)
Learning path
- Start: Learn delimiters, text qualifiers, and header handling
- Practice: Do small imports with staging tables and validations
- Advance: Handle large files, compression, and server-side exports
- Capstone: Create an end-to-end import-transform-export workflow
Common mistakes recap
- Using server-only commands with local paths (and vice versa)
- Forgetting HEADER/IGNORE 1, importing column names as data
- Mishandling Windows line endings
- Not validating NULLs and empty strings
Mini challenge
You receive a 2GB CSV with semicolons, quoted text, and a header. Dates are dd-mm-yyyy. Describe the exact command (for your DB) and options you would use to load into a staging table, verify counts, and export only the last 30 days to a new CSV. Include delimiter, quote, escape, header, and date parsing choices.
Next steps
- Automate imports on a schedule (cron, orchestrators) using your chosen DB tooling
- Add data quality checks (row counts, null thresholds, duplicates)
- Document your import/export conventions for your team
Quick Test
Take the quick test below to check your understanding. Available to everyone; logged-in users get saved progress.