Full-Text Search

Search any text column in your lakehouse. Built on Tantivy (Rust port of Lucene), queryable from any SQL via the fts_search() table function. Joins cleanly with the rest of your data and works through the wire protocol.

How It Works

Build an index with rq fts create on a text column.
Query with fts_search('index_name', 'your query') from any SQL: interactive editor, saved queries, wire-protocol clients, public charts.
Rebuild when the source table changes via rq fts rebuild.

The index lives as a single archive on your storage (counted toward your storage quota). It's separate from your data. Dropping it doesn't touch your tables, building it doesn't add columns.

Quick Start

bash

# Build an index over a text column
rq fts create hn_body --table hackernews.comments --column body

# Use it in any SQL
rq query "
  SELECT c.id, c.author, h.score
  FROM hackernews.comments c
  JOIN fts_search('hn_body', 'rust async runtime') h ON c.id = h.doc_id
  ORDER BY h.score DESC
  LIMIT 20
"

# Build an index over a text column
rq fts create hn_body --table hackernews.comments --column body

# Use it in any SQL
rq query "
  SELECT c.id, c.author, h.score
  FROM hackernews.comments c
  JOIN fts_search('hn_body', 'rust async runtime') h ON c.id = h.doc_id
  ORDER BY h.score DESC
  LIMIT 20
"

That's it. The query runs in milliseconds even on tens of millions of rows.

The `fts_search` Function

sql

fts_search(index_name, query [, limit [, fuzzy]])

fts_search(index_name, query [, limit [, fuzzy]])

Returns a relation with two columns:

Column	Type	Meaning
`doc_id`	BIGINT	Row position in the source table
`score`	FLOAT	BM25 relevance score (higher = better)

Arguments:

Arg	Default	Notes
`index_name`	required	The name you gave the index when creating it
`query`	required	Lucene-style query string (see below)
`limit`	10	Top-N hits to return (1..1000)
`fuzzy`	0	Levenshtein edit distance for single-term fuzzy match (0, 1, or 2)

You join the result on doc_id to recover the rest of the row's columns.

Query Syntax

The default analyzer (bm25) lowercases tokens and applies Snowball English stemming, so rust matches rust, rusty, rusted. The query string supports:

sql

-- Single term (matches stemmed forms)
fts_search('hn_body', 'rust')

-- Multiple terms, implicit OR, scored by BM25
fts_search('hn_body', 'rust async runtime')

-- AND requirement
fts_search('hn_body', '+rust +async')

-- Phrase
fts_search('hn_body', '"async runtime"')

-- Exclusion
fts_search('hn_body', 'rust -tokio')

-- Top 50 hits
fts_search('hn_body', 'rust async', 50)

-- Single term (matches stemmed forms)
fts_search('hn_body', 'rust')

-- Multiple terms, implicit OR, scored by BM25
fts_search('hn_body', 'rust async runtime')

-- AND requirement
fts_search('hn_body', '+rust +async')

-- Phrase
fts_search('hn_body', '"async runtime"')

-- Exclusion
fts_search('hn_body', 'rust -tokio')

-- Top 50 hits
fts_search('hn_body', 'rust async', 50)

For a typo-tolerant search, set fuzzy to 1 or 2 (Levenshtein edit distance). Only the first whitespace-separated token is used for fuzzy matching:

sql

-- Matches 'rust', 'rusty', 'rusted', and typos within 2 edits
fts_search('hn_body', 'rsut', 50, 2)

-- Matches 'rust', 'rusty', 'rusted', and typos within 2 edits
fts_search('hn_body', 'rsut', 50, 2)

Multiple Columns

You can index several columns in a single index. They're concatenated with a space at index time and treated as one searchable field:

bash

rq fts create products --table catalog.items \
  --column title --column description --column tags

rq fts create products --table catalog.items \
  --column title --column description --column tags

Lifecycle

bash

rq fts                            # list indexes in the workspace
rq fts show hn_body               # status, size, snapshot, freshness
rq fts rebuild hn_body            # rebuild against the current snapshot
rq fts drop hn_body               # delete the index and its archive

rq fts                            # list indexes in the workspace
rq fts show hn_body               # status, size, snapshot, freshness
rq fts rebuild hn_body            # rebuild against the current snapshot
rq fts drop hn_body               # delete the index and its archive

When the source table changes (sync, transform, push), the index is marked STALE. It still works, queries just surface a warning so you know to rebuild. There's no auto-rebuild yet, you control when the index refreshes.

Saved query + REST API

fts_search() works inside a saved query. Bind the search text as a :param and call /run.

Save:

bash

rq queries create job-search \
  --sql "SELECT t.title, t.author, t.url, h.score
         FROM hackernews.jobs t
         JOIN fts_search('jobs_fts', :q, :limit) h ON t.id = h.doc_id
         ORDER BY h.score DESC"

rq queries create job-search \
  --sql "SELECT t.title, t.author, t.url, h.score
         FROM hackernews.jobs t
         JOIN fts_search('jobs_fts', :q, :limit) h ON t.id = h.doc_id
         ORDER BY h.score DESC"

Run:

bash

curl -X POST https://api.rawquery.dev/api/v1/workspaces/<workspace-id>/saved-queries/job-search/run \
  -H "Authorization: Bearer rq_xxx" \
  -H "Content-Type: application/json" \
  -d '{"parameters": {"q": "rust async runtime", "limit": "20"}, "format": "objects"}'

curl -X POST https://api.rawquery.dev/api/v1/workspaces/<workspace-id>/saved-queries/job-search/run \
  -H "Authorization: Bearer rq_xxx" \
  -H "Content-Type: application/json" \
  -d '{"parameters": {"q": "rust async runtime", "limit": "20"}, "format": "objects"}'

Response:

json

{
  "data": [
    {"title": "...", "author": "...", "score": 11.31},
    {"title": "...", "author": "...", "score":  8.14}
  ],
  "row_count": 2,
  "execution_time_ms": 187.9
}

{
  "data": [
    {"title": "...", "author": "...", "score": 11.31},
    {"title": "...", "author": "...", "score":  8.14}
  ],
  "row_count": 2,
  "execution_time_ms": 187.9
}

The :q parameter accepts the full query syntax (single term, +AND, "phrase", -exclusion). For fuzzy matching expose a fourth parameter:

sql

JOIN fts_search('jobs_fts', :q, :limit, :fuzzy) h ON t.id = h.doc_id

JOIN fts_search('jobs_fts', :q, :limit, :fuzzy) h ON t.id = h.doc_id

Hybrid Search (BM25 + Vector)

You can combine FTS and vector search in pure SQL. Run both, fuse with Reciprocal Rank Fusion or convex combination:

sql

-- Top BM25 hits
WITH lex AS (
  SELECT doc_id, score AS bm25
  FROM fts_search('hn_body', 'rust async', 100)
),
-- Top vector hits (assumes a hackernews.comments_embedded table from `rq embed`)
vec AS (
  SELECT id AS doc_id,
         array_distance(embedding, :query_vec::FLOAT[384]) AS dist
  FROM hackernews.comments_embedded
  ORDER BY dist
  LIMIT 100
)
SELECT
  COALESCE(lex.doc_id, vec.doc_id) AS doc_id,
  -- Convex combination: tune alpha (0..1) for your dataset
  COALESCE(0.6 * (lex.bm25 / (SELECT MAX(bm25) FROM lex)), 0)
  + COALESCE(0.4 * (1 - vec.dist / (SELECT MAX(dist) FROM vec)), 0) AS hybrid_score
FROM lex
FULL OUTER JOIN vec USING (doc_id)
ORDER BY hybrid_score DESC
LIMIT 20;

-- Top BM25 hits
WITH lex AS (
  SELECT doc_id, score AS bm25
  FROM fts_search('hn_body', 'rust async', 100)
),
-- Top vector hits (assumes a hackernews.comments_embedded table from `rq embed`)
vec AS (
  SELECT id AS doc_id,
         array_distance(embedding, :query_vec::FLOAT[384]) AS dist
  FROM hackernews.comments_embedded
  ORDER BY dist
  LIMIT 100
)
SELECT
  COALESCE(lex.doc_id, vec.doc_id) AS doc_id,
  -- Convex combination: tune alpha (0..1) for your dataset
  COALESCE(0.6 * (lex.bm25 / (SELECT MAX(bm25) FROM lex)), 0)
  + COALESCE(0.4 * (1 - vec.dist / (SELECT MAX(dist) FROM vec)), 0) AS hybrid_score
FROM lex
FULL OUTER JOIN vec USING (doc_id)
ORDER BY hybrid_score DESC
LIMIT 20;

Quotas

Each plan includes a monthly allotment of rows indexed. Rebuilds count toward this quota.

Plan	Rows / month	Overage
Free	100k	hard cap
Team	2M	€0.50 / 1M
Business	20M	€0.30 / 1M

Storage of the index archive itself counts toward your normal storage quota.

The index archive size is typically 20–35% of the indexed text column's compressed size, with ZSTD applied on top.

Tips

Project early. When joining with the source table, select only the columns you need: SELECT c.id, c.author, h.score, not SELECT *.
Tighten limit. Fewer hits = less work. The default of 10 is for "show me the most relevant"; raise it only when you need ranking far down the list.
Index narrow, not wide. A separate fts create per column gives you more control than one giant multi-column index.
Rebuild on a cadence that matches your data. Static corpora (HN dump, product catalogue): build once. Live data (support tickets, logs): rebuild after each sync.

Limits

Index name: 1–100 chars, alphanumeric + _ + -
Up to 5 columns per index
Single build job: hard-capped at 5× your monthly included rows on overage plans (mid-job safety)
Concurrent builds platform-wide: 3 (additional submissions wait in QUEUED)
fts_search arguments must be string literals in the SQL. No column references in the index name or query
fts_search limit argument: 1..1000
fuzzy: 0, 1, or 2

Full-Text Search

How It Works

Quick Start

The fts_search Function

Query Syntax

Multiple Columns

Lifecycle

Saved query + REST API

Hybrid Search (BM25 + Vector)

Quotas

Tips

Limits

The `fts_search` Function