Sync Strategies

How rawquery keeps your data in sync - and how to pick the right mode for each table.

Overview

When rawquery syncs data from a source, it needs a strategy: should it replace everything, append new records, or merge changes? The answer depends on what kind of data you have.

rawquery supports 4 sync modes. Most of the time, you don't need to choose - rawquery picks the right default based on what the source supports. But you can override it per stream if needed.

The 4 Modes

Full Refresh

Mode: full_refresh

Drops and replaces all data on every sync. The simplest mode - no state to track, no edge cases.

Use for: Small reference tables, config data, lookup tables, or any source without a reliable cursor field.

Example: A products table with 500 rows. It's small enough to re-sync fully every time, and you always get a clean snapshot.

Trade-off: Slower for large tables since it re-fetches everything. Not suitable for tables with millions of rows.

Append

Mode: incremental

Fetches only new records since the last sync using a cursor field (typically created_at or an auto-incrementing ID). New records are appended to the table.

Use for: Immutable, append-only data - event logs, charges, page views, audit trails.

Example: Stripe charges. Each charge is created once and never updated. Append mode fetches only charges created since the last sync.

Trade-off: Does not capture updates to existing records. If a record changes after it was synced, the table will have stale data.

Merge

Mode: incremental_dedupe

Fetches new records, then deduplicates on the primary key. If a record with the same key already exists, the newer version wins. This is an upsert.

Use for: Entities that change over time - customers, subscriptions, products, deals.

Example: HubSpot contacts. A contact's email or lifecycle stage can change. Merge mode fetches recent changes and updates the existing record in place.

How dedup works: After fetching new records, rawquery reads the existing table, concatenates old + new, groups by primary key, and keeps the last occurrence. The table is then overwritten with the deduplicated result.

Trade-off: Slightly more expensive than append since it reads the existing table to deduplicate. Best for tables where updates are common and you need the latest state.

Window

Mode: window

Re-syncs a sliding window of the last N days. Data outside the window is kept as-is; data inside the window is replaced entirely with fresh data from the source.

Use for: Analytics data that gets updated retroactively - ad spend, attribution data, metrics that are revised after the fact.

Example: Google Ads data where conversions are attributed up to 30 days after a click. A 30-day window ensures you always have the latest attribution data.

Configuring window size: Select "Window" in the sync mode dropdown for a stream, then set the number of days in the input that appears. Default is 30 days. From the CLI: rq connections update my-conn --sync-mode prices:window:30.

Trade-off: Re-fetches more data than append, but less than full refresh. The window size is a trade-off between freshness and sync speed.

Smart Defaults

Each connector declares what sync modes each stream supports and which one is the default. rawquery picks the right mode automatically:

Streams with a cursor field (e.g. created_at) default to incremental (append)
Streams without a cursor field default to full refresh
Streams with a primary key can use incremental_dedupe (merge) if you enable it
Window mode is opt-in and requires you to set the window size

You can override the sync mode per stream in the connection settings.

When to Use Each

Question	Mode
Small table, no cursor, need a clean snapshot?	`full_refresh`
Records are created but never updated?	`incremental`
Records can be created or updated?	`incremental_dedupe`
Data is revised retroactively in a time window?	`window`

First Sync Behavior

On the first sync, all modes behave the same: they fetch all available data and write it to a new table. The sync mode only matters on subsequent syncs, when there is already existing data to work with.

After the first sync, rawquery stores a cursor value (for incremental and incremental_dedupe modes) so the next sync picks up where it left off.