Sync Strategies
How rawquery keeps your data in sync - and how to pick the right mode for each table.
Overview
When rawquery syncs data from a source, it needs a strategy: should it replace everything, append new records, or merge changes? The answer depends on what kind of data you have.
rawquery supports 4 sync modes. Most of the time, you don't need to choose - rawquery picks the right default based on what the source supports. But you can override it per stream if needed.
The 4 Modes
Full Refresh
Mode: full_refresh
Drops and replaces all data on every sync. The simplest mode - no state to track, no edge cases.
Use for: Small reference tables, config data, lookup tables, or any source without a reliable cursor field.
Example: A products table with 500 rows. It's small enough to re-sync fully every time, and you always get a clean snapshot.
Trade-off: Slower for large tables since it re-fetches everything. Not suitable for tables with millions of rows.
Append
Mode: incremental
Fetches only new records since the last sync using a cursor field (typically created_at or an auto-incrementing ID). New records are appended to the table.
Use for: Immutable, append-only data - event logs, charges, page views, audit trails.
Example: Stripe charges. Each charge is created once and never updated. Append mode fetches only charges created since the last sync.
Trade-off: Does not capture updates to existing records. If a record changes after it was synced, the table will have stale data.
Merge
Mode: incremental_dedupe
Fetches new records, then deduplicates on the primary key. If a record with the same key already exists, the newer version wins. This is an upsert.
Use for: Entities that change over time - customers, subscriptions, products, deals.
Example: HubSpot contacts. A contact's email or lifecycle stage can change. Merge mode fetches recent changes and updates the existing record in place.
How dedup works: After fetching new records, rawquery reads the existing table, concatenates old + new, groups by primary key, and keeps the last occurrence. The table is then overwritten with the deduplicated result.
Trade-off: Slightly more expensive than append since it reads the existing table to deduplicate. Best for tables where updates are common and you need the latest state.
Window
Mode: window
Re-syncs a sliding window of the last N days. Data outside the window is kept as-is; data inside the window is replaced entirely with fresh data from the source.
Use for: Analytics data that gets updated retroactively - ad spend, attribution data, metrics that are revised after the fact.
Example: Google Ads data where conversions are attributed up to 30 days after a click. A 30-day window ensures you always have the latest attribution data.
Configuring window size: Select "Window" in the sync mode dropdown for a stream, then set the number of days in the input that appears. Default is 30 days. From the CLI: rq connections update my-conn --sync-mode prices:window:30.
Trade-off: Re-fetches more data than append, but less than full refresh. The window size is a trade-off between freshness and sync speed.
Smart Defaults
Each connector declares what sync modes each stream supports and which one is the default. rawquery picks the right mode automatically:
- Streams with a cursor field (e.g.
created_at) default to incremental (append) - Streams without a cursor field default to full refresh
- Streams with a primary key can use incremental_dedupe (merge) if you enable it
- Window mode is opt-in and requires you to set the window size
You can override the sync mode per stream in the connection settings.
When to Use Each
| Question | Mode |
|---|---|
| Small table, no cursor, need a clean snapshot? | full_refresh |
| Records are created but never updated? | incremental |
| Records can be created or updated? | incremental_dedupe |
| Data is revised retroactively in a time window? | window |
First Sync Behavior
On the first sync, all modes behave the same: they fetch all available data and write it to a new table. The sync mode only matters on subsequent syncs, when there is already existing data to work with.
After the first sync, rawquery stores a cursor value (for incremental and incremental_dedupe modes) so the next sync picks up where it left off.