No credit cardStart free

Full-Text Search

Search any text column in your lakehouse. Built on Tantivy (Rust port of Lucene), queryable from any SQL via the fts_search() table function. Joins cleanly with the rest of your data and works through the wire protocol.

How It Works

  1. Build an index with rq fts create on a text column.
  2. Query with fts_search('index_name', 'your query') from any SQL: interactive editor, saved queries, wire-protocol clients, public charts.
  3. Rebuild when the source table changes via rq fts rebuild.

The index lives as a single archive on your storage (counted toward your storage quota). It's separate from your data. Dropping it doesn't touch your tables, building it doesn't add columns.

Quick Start

bash
# Build an index over a text column
rq fts create hn_body --table hackernews.comments --column body
# Use it in any SQL
rq query "
SELECT c.id, c.author, h.score
FROM hackernews.comments c
JOIN fts_search('hn_body', 'rust async runtime') h ON c.id = h.doc_id
ORDER BY h.score DESC
LIMIT 20
"

That's it. The query runs in milliseconds even on tens of millions of rows.

The fts_search Function

sql
fts_search(index_name, query [, limit [, fuzzy]])

Returns a relation with two columns:

ColumnTypeMeaning
doc_idBIGINTRow position in the source table
scoreFLOATBM25 relevance score (higher = better)

Arguments:

ArgDefaultNotes
index_namerequiredThe name you gave the index when creating it
queryrequiredLucene-style query string (see below)
limit10Top-N hits to return (1..1000)
fuzzy0Levenshtein edit distance for single-term fuzzy match (0, 1, or 2)

You join the result on doc_id to recover the rest of the row's columns.

Query Syntax

The default analyzer (bm25) lowercases tokens and applies Snowball English stemming, so rust matches rust, rusty, rusted. The query string supports:

sql
-- Single term (matches stemmed forms)
fts_search('hn_body', 'rust')
-- Multiple terms, implicit OR, scored by BM25
fts_search('hn_body', 'rust async runtime')
-- AND requirement
fts_search('hn_body', '+rust +async')
-- Phrase
fts_search('hn_body', '"async runtime"')
-- Exclusion
fts_search('hn_body', 'rust -tokio')
-- Top 50 hits
fts_search('hn_body', 'rust async', 50)

For a typo-tolerant search, set fuzzy to 1 or 2 (Levenshtein edit distance). Only the first whitespace-separated token is used for fuzzy matching:

sql
-- Matches 'rust', 'rusty', 'rusted', and typos within 2 edits
fts_search('hn_body', 'rsut', 50, 2)

Multiple Columns

You can index several columns in a single index. They're concatenated with a space at index time and treated as one searchable field:

bash
rq fts create products --table catalog.items \
--column title --column description --column tags

Lifecycle

bash
rq fts # list indexes in the workspace
rq fts show hn_body # status, size, snapshot, freshness
rq fts rebuild hn_body # rebuild against the current snapshot
rq fts drop hn_body # delete the index and its archive

When the source table changes (sync, transform, push), the index is marked STALE. It still works, queries just surface a warning so you know to rebuild. There's no auto-rebuild yet, you control when the index refreshes.

Saved query + REST API

fts_search() works inside a saved query. Bind the search text as a :param and call /run.

Save:

bash
rq queries create job-search \
--sql "SELECT t.title, t.author, t.url, h.score
FROM hackernews.jobs t
JOIN fts_search('jobs_fts', :q, :limit) h ON t.id = h.doc_id
ORDER BY h.score DESC"

Run:

bash
curl -X POST https://api.rawquery.dev/api/v1/workspaces/<workspace-id>/saved-queries/job-search/run \
-H "Authorization: Bearer rq_xxx" \
-H "Content-Type: application/json" \
-d '{"parameters": {"q": "rust async runtime", "limit": "20"}, "format": "objects"}'

Response:

json
{
"data": [
{"title": "...", "author": "...", "score": 11.31},
{"title": "...", "author": "...", "score": 8.14}
],
"row_count": 2,
"execution_time_ms": 187.9
}

The :q parameter accepts the full query syntax (single term, +AND, "phrase", -exclusion). For fuzzy matching expose a fourth parameter:

sql
JOIN fts_search('jobs_fts', :q, :limit, :fuzzy) h ON t.id = h.doc_id

Hybrid Search (BM25 + Vector)

You can combine FTS and vector search in pure SQL. Run both, fuse with Reciprocal Rank Fusion or convex combination:

sql
-- Top BM25 hits
WITH lex AS (
SELECT doc_id, score AS bm25
FROM fts_search('hn_body', 'rust async', 100)
),
-- Top vector hits (assumes a hackernews.comments_embedded table from `rq embed`)
vec AS (
SELECT id AS doc_id,
array_distance(embedding, :query_vec::FLOAT[384]) AS dist
FROM hackernews.comments_embedded
ORDER BY dist
LIMIT 100
)
SELECT
COALESCE(lex.doc_id, vec.doc_id) AS doc_id,
-- Convex combination: tune alpha (0..1) for your dataset
COALESCE(0.6 * (lex.bm25 / (SELECT MAX(bm25) FROM lex)), 0)
+ COALESCE(0.4 * (1 - vec.dist / (SELECT MAX(dist) FROM vec)), 0) AS hybrid_score
FROM lex
FULL OUTER JOIN vec USING (doc_id)
ORDER BY hybrid_score DESC
LIMIT 20;

Quotas

Each plan includes a monthly allotment of rows indexed. Rebuilds count toward this quota.

PlanRows / monthOverage
Free100khard cap
Team2M€0.50 / 1M
Business20M€0.30 / 1M

Storage of the index archive itself counts toward your normal storage quota.

The index archive size is typically 20–35% of the indexed text column's compressed size, with ZSTD applied on top.

Tips

  • Project early. When joining with the source table, select only the columns you need: SELECT c.id, c.author, h.score, not SELECT *.
  • Tighten limit. Fewer hits = less work. The default of 10 is for "show me the most relevant"; raise it only when you need ranking far down the list.
  • Index narrow, not wide. A separate fts create per column gives you more control than one giant multi-column index.
  • Rebuild on a cadence that matches your data. Static corpora (HN dump, product catalogue): build once. Live data (support tickets, logs): rebuild after each sync.

Limits

  • Index name: 1–100 chars, alphanumeric + _ + -
  • Up to 5 columns per index
  • Single build job: hard-capped at your monthly included rows on overage plans (mid-job safety)
  • Concurrent builds platform-wide: 3 (additional submissions wait in QUEUED)
  • fts_search arguments must be string literals in the SQL. No column references in the index name or query
  • fts_search limit argument: 1..1000
  • fuzzy: 0, 1, or 2