How to learn Full Text Search Basics for SQL in Data Analyst for free

Why this matters

As a Data Analyst, you often search messy text: feedback, tickets, survey responses, product names, and docs. Full-text search (FTS) gives you fast, relevant results with stemming, stopwords, phrases, and ranking—far beyond simple LIKE '%word%'.

Filter customer feedback by topics: "refund", "billing issue", or exact phrases like "credit card declined".
Build dashboards where users search knowledge base articles and see the most relevant first.
Create cohorts (e.g., users who mentioned "shipping delay") for impact analysis.

Who this is for

Data Analysts who query text columns and need better relevance and speed than LIKE.
Analysts preparing to work with product analytics, support analytics, or search features.

Prerequisites

Comfort with basic SQL (SELECT, WHERE, ORDER BY).
Basic understanding of indexes.

Concept explained simply

Full-text search turns text into a special searchable format and uses an index to find relevant rows quickly. It understands words, ignores common ones (stopwords like "the"), and can match variations (stem/lemmatize) like "run" vs. "running". It can also rank results by relevance.

Mental model

Think of a book index at the back: each word points to pages. FTS builds a smart index of words to rows. Queries look up words, combine rules (AND/OR/NOT, phrases), and score rows by how well they match.

Core terms and operators

Index: Special structure that makes text searches fast (e.g., GIN in PostgreSQL, FULLTEXT in MySQL, Full-Text Index in SQL Server).
Document: The text field(s) you search (e.g., title + body).
Stopwords: Very common words often ignored (e.g., "a", "the").
Stemming/Lemmatization: Match variations of a word ("analyze", "analysis").
Ranking/Score: Numeric relevance (higher = more relevant).

Vendor quick guide

PostgreSQL: to_tsvector(), to_tsquery(), plainto_tsquery(), websearch_to_tsquery(), ts_rank, GIN index.
MySQL: FULLTEXT index; MATCH(...) AGAINST(...) in NATURAL LANGUAGE or BOOLEAN mode.
SQL Server: Full-Text Index; CONTAINS, CONTAINSTABLE, FREETEXT, FREETEXTTABLE.

Worked examples

Example 1: PostgreSQL — AND search with ranking

-- Create a searchable index over title and body
CREATE INDEX IF NOT EXISTS idx_articles_fts
ON articles USING GIN (to_tsvector('english', coalesce(title,'') || ' ' || coalesce(body,'')));

-- Query: rows containing both "budget" AND "forecast"
SELECT id, title,
       ts_rank(
         to_tsvector('english', coalesce(title,'') || ' ' || coalesce(body,'')),
         to_tsquery('english', 'budget & forecast')
       ) AS rank
FROM articles
WHERE to_tsvector('english', coalesce(title,'') || ' ' || coalesce(body,'')) @@ to_tsquery('english', 'budget & forecast')
ORDER BY rank DESC
LIMIT 10;

Why it works: & is logical AND; ts_rank gives a score so the best matches appear first.

Example 2: MySQL — Phrase and MUST/MUST NOT

-- Create FULLTEXT index
ALTER TABLE tickets ADD FULLTEXT ft_subject_body (subject, body);

-- Phrase search: the exact phrase "credit card"
SELECT id, subject, MATCH(subject, body) AGAINST ('"credit card"' IN BOOLEAN MODE) AS score
FROM tickets
WHERE MATCH(subject, body) AGAINST ('"credit card"' IN BOOLEAN MODE)
ORDER BY score DESC
LIMIT 10;

-- Require "refund" but exclude "chargeback"
SELECT id, subject
FROM tickets
WHERE MATCH(subject, body) AGAINST ('+refund -chargeback' IN BOOLEAN MODE);

Notes: Quotes enforce phrase. + requires a term; - excludes a term.

Example 3: SQL Server — FREETEXT vs CONTAINS

-- Assume a Full-Text Index exists on dbo.KB(title, body)
-- Meaning-based search
SELECT TOP 10 id, title
FROM dbo.KB
WHERE FREETEXT((title, body), 'shipping delay');

-- Exact phrase and prefix search
SELECT TOP 10 id, title
FROM dbo.KB
WHERE CONTAINS((title, body), '"delivery time" OR ship*');

FREETEXT matches similar meaning; CONTAINS supports phrases and prefixes (e.g., ship*).

Hands-on exercises

Pick the syntax for your database. If you cannot create indexes, still write the query—focus on correctness.

Exercise 1: Must-have and must-not-have terms

Goal: Find products that mention "wireless" but do not mention "bluetooth".

Setup (example data)

-- PostgreSQL
CREATE TEMP TABLE products(name text, description text);
INSERT INTO products VALUES
('Alpha Headphones', 'Wireless over-ear headphones with bluetooth 5.0'),
('Bravo Keyboard', 'Wireless compact keyboard with 2.4G receiver'),
('Charlie Mouse', 'Ergonomic bluetooth wireless mouse'),
('Delta Speaker', 'Portable wireless speaker with aux input');
CREATE INDEX ON products USING GIN (to_tsvector('english', coalesce(name,'') || ' ' || coalesce(description,'')));

-- MySQL
CREATE TEMPORARY TABLE products(name varchar(200), description text);
INSERT INTO products VALUES
('Alpha Headphones','Wireless over-ear headphones with bluetooth 5.0'),
('Bravo Keyboard','Wireless compact keyboard with 2.4G receiver'),
('Charlie Mouse','Ergonomic bluetooth wireless mouse'),
('Delta Speaker','Portable wireless speaker with aux input');
ALTER TABLE products ADD FULLTEXT ft_ndx (name, description);

-- SQL Server (temporary example table)
CREATE TABLE #products(name nvarchar(200), description nvarchar(max));
INSERT INTO #products VALUES
('Alpha Headphones', 'Wireless over-ear headphones with bluetooth 5.0'),
('Bravo Keyboard', 'Wireless compact keyboard with 2.4G receiver'),
('Charlie Mouse', 'Ergonomic bluetooth wireless mouse'),
('Delta Speaker', 'Portable wireless speaker with aux input');
-- Assume a Full-Text Index exists on #products(name, description)

Your task

Write a full-text query that returns product names that include the concept "wireless" AND exclude "bluetooth".
Order results by relevance if your DB supports scoring.

Exercise 2: Phrase search with ranking

Goal: Return documents most relevant to the phrase "machine learning" with phrase boosted over single words.

Setup (example data)

-- PostgreSQL
CREATE TEMP TABLE docs(title text, body text);
INSERT INTO docs VALUES
('Intro to ML', 'This article introduces machine learning and basic models.'),
('Deep Learning Tricks', 'Advanced neural networks for image tasks.'),
('Data Cleaning', 'How to clean data before modeling and machine-learning steps.'),
('ML in Production', 'Best practices to deploy machine learning to production.');
CREATE INDEX ON docs USING GIN (to_tsvector('english', coalesce(title,'') || ' ' || coalesce(body,'')));

-- MySQL
CREATE TEMPORARY TABLE docs(title varchar(200), body text);
INSERT INTO docs VALUES
('Intro to ML', 'This article introduces machine learning and basic models.'),
('Deep Learning Tricks', 'Advanced neural networks for image tasks.'),
('Data Cleaning', 'How to clean data before modeling and machine-learning steps.'),
('ML in Production', 'Best practices to deploy machine learning to production.');
ALTER TABLE docs ADD FULLTEXT ft_docs (title, body);

-- SQL Server (assume Full-Text Index exists)
CREATE TABLE #docs(title nvarchar(200), body nvarchar(max));
INSERT INTO #docs VALUES
('Intro to ML', 'This article introduces machine learning and basic models.'),
('Deep Learning Tricks', 'Advanced neural networks for image tasks.'),
('Data Cleaning', 'How to clean data before modeling and machine-learning steps.'),
('ML in Production', 'Best practices to deploy machine learning to production.');

Your task

Write a phrase query for "machine learning".
Return top 3 most relevant rows.

Self-check checklist

I created/used an appropriate full-text index.
My query uses phrase syntax or boolean operators, not just LIKE.
I handled nulls safely when concatenating fields (e.g., coalesce).
I returned ranked results where supported.
I validated that excluded terms do not appear in results.

Common mistakes and how to self-check

Relying only on LIKE: It misses word variations and is slow. Self-check: Measure execution time and compare relevance quality.
No index: Full-text queries without an index can be very slow. Self-check: Confirm the planner uses the FTS index (e.g., EXPLAIN).
Incorrect language/stopwords: Results seem incomplete. Self-check: Try different configurations or remove very common words from the query.
Concatenating nullable fields without COALESCE: Can produce null vectors. Self-check: Ensure concatenation handles nulls.
Over-filtering with NOT: Excluding broad terms may remove relevant rows. Self-check: Inspect a few false negatives and adjust.

Practical projects

Feedback search: Build a query/report that finds top 10 complaints about "shipping" and ranks them.
Knowledge base search: Create a stored view that returns ranked results for a term and a phrase, with timestamps.
Topic trends: Weekly counts of tickets matching "refund" OR "chargeback" minus those containing "fraud".

Learning path

Start: Full-text basics (this page).
Next: Advanced query syntax (proximity, weights, custom dictionaries).
Then: Performance tuning (index types, refresh, maintenance).
Finally: Analytics use-cases (topic tagging, cohort building, trend lines).

Next steps

Complete the exercises above and compare with the solutions.
Take the Quick Test below to confirm understanding.

Note about saving your progress

The quick test is available to everyone. Only logged-in users will have their progress saved.

Mini challenge

Create a single query that returns the top 5 documents for users searching: must include the phrase "account number", must include either "lost" or "stolen", and must exclude the term "temporary". Return a relevance score and the matched fields.

Menu

Full Text Search Basics

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Mental model

Core terms and operators

Worked examples

Hands-on exercises

Exercise 1: Must-have and must-not-have terms

Exercise 2: Phrase search with ranking

Self-check checklist

Common mistakes and how to self-check

Practical projects

Learning path

Next steps

Mini challenge

Practice Exercises

Must-have and must-not-have terms

Instructions

Expected Output

Phrase search with ranking

Full Text Search Basics — Quick Test

Have questions about Full Text Search Basics?

AI Assistant