No credit cardStart free

Sync Strategies

How rawquery keeps your data in sync - and how to pick the right mode for each table.

Overview

When rawquery syncs data from a source, it needs a strategy: should it replace everything, append new records, or merge changes? The answer depends on what kind of data you have.

rawquery supports 4 sync modes. Most of the time, you don't need to choose - rawquery picks the right default based on what the source supports. But you can override it per stream if needed.

The 4 Modes

Full Refresh

Mode: full_refresh

Drops and replaces all data on every sync. The simplest mode - no state to track, no edge cases.

Use for: Small reference tables, config data, lookup tables, or any source without a reliable cursor field.

Example: A products table with 500 rows. It's small enough to re-sync fully every time, and you always get a clean snapshot.

Trade-off: Slower for large tables since it re-fetches everything. Not suitable for tables with millions of rows.

Append

Mode: incremental

Fetches only new records since the last sync using a cursor field (typically created_at or an auto-incrementing ID). New records are appended to the table.

Use for: Immutable, append-only data - event logs, charges, page views, audit trails.

Example: Stripe charges. Each charge is created once and never updated. Append mode fetches only charges created since the last sync.

Trade-off: Does not capture updates to existing records. If a record changes after it was synced, the table will have stale data.

Merge

Mode: incremental_dedupe

Fetches new records, then deduplicates on the primary key. If a record with the same key already exists, the newer version wins. This is an upsert.

Use for: Entities that change over time - customers, subscriptions, products, deals.

Example: HubSpot contacts. A contact's email or lifecycle stage can change. Merge mode fetches recent changes and updates the existing record in place.

How dedup works: After fetching new records, rawquery reads the existing table, concatenates old + new, groups by primary key, and keeps the last occurrence. The table is then overwritten with the deduplicated result.

Trade-off: Slightly more expensive than append since it reads the existing table to deduplicate. Best for tables where updates are common and you need the latest state.

Window

Mode: window

Re-syncs a sliding window of the last N days. Data outside the window is kept as-is; data inside the window is replaced entirely with fresh data from the source.

Use for: Analytics data that gets updated retroactively - ad spend, attribution data, metrics that are revised after the fact.

Example: Google Ads data where conversions are attributed up to 30 days after a click. A 30-day window ensures you always have the latest attribution data.

Configuring window size: Select "Window" in the sync mode dropdown for a stream, then set the number of days in the input that appears. Default is 30 days. From the CLI: rq connections update my-conn --sync-mode prices:window:30.

Trade-off: Re-fetches more data than append, but less than full refresh. The window size is a trade-off between freshness and sync speed.

Smart Defaults

Each connector declares what sync modes each stream supports and which one is the default. rawquery picks the right mode automatically:

  • Streams with a cursor field (e.g. created_at) default to incremental (append)
  • Streams without a cursor field default to full refresh
  • Streams with a primary key can use incremental_dedupe (merge) if you enable it
  • Window mode is opt-in and requires you to set the window size

You can override the sync mode per stream in the connection settings.

When to Use Each

QuestionMode
Small table, no cursor, need a clean snapshot?full_refresh
Records are created but never updated?incremental
Records can be created or updated?incremental_dedupe
Data is revised retroactively in a time window?window

First Sync Behavior

On the first sync, all modes behave the same: they fetch all available data and write it to a new table. The sync mode only matters on subsequent syncs, when there is already existing data to work with.

After the first sync, rawquery stores a cursor value (for incremental and incremental_dedupe modes) so the next sync picks up where it left off.