Push Data

Push JSON, JSONL, CSV, or Parquet into Iceberg tables from files or stdin. No connector setup, no scheduling - just pipe data in.

Quick Start

bash

# Push a JSON file
rq push --table crm.contacts --file data.json

# With a primary key (for deduplication on subsequent pushes)
rq push --table crm.contacts --file data.json --pk id

# Replace the table entirely
rq push --table crm.contacts --file data.json --mode overwrite

# Push a Parquet file (up to 2 GB, streaming)
rq push --table raw.migration --file dump.parquet

# Push a JSON file
rq push --table crm.contacts --file data.json

# With a primary key (for deduplication on subsequent pushes)
rq push --table crm.contacts --file data.json --pk id

# Replace the table entirely
rq push --table crm.contacts --file data.json --mode overwrite

# Push a Parquet file (up to 2 GB, streaming)
rq push --table raw.migration --file dump.parquet

Supported Formats

Format is auto-detected from the file extension. Override with --input-format.

bash

# JSON array (auto-detected from .json)
rq push --table raw.users --file users.json

# JSON lines - one object per line (auto-detected from .jsonl)
rq push --table raw.events --file events.jsonl

# CSV with headers (auto-detected from .csv)
rq push --table raw.products --file products.csv

# Parquet (auto-detected from .parquet) — typed, streaming, up to 2 GB
rq push --table raw.orders --file orders.parquet

# JSON array (auto-detected from .json)
rq push --table raw.users --file users.json

# JSON lines - one object per line (auto-detected from .jsonl)
rq push --table raw.events --file events.jsonl

# CSV with headers (auto-detected from .csv)
rq push --table raw.products --file products.csv

# Parquet (auto-detected from .parquet) — typed, streaming, up to 2 GB
rq push --table raw.orders --file orders.parquet

Parquet

Parquet takes a dedicated streaming upload path. The file is sent as-is — no JSON serialisation, no 100k-rows-per-request ceiling — and the server writes it out as one Iceberg snapshot containing N data files.

Max size: 2 GB per push. Split larger files and push in sequence.
Stdin not supported: Parquet requires --file <path>.
Type fidelity: column types are taken from the Parquet schema, not inferred.
Schema evolution: new columns are added automatically on append; conflicting types are rejected (use --mode overwrite to replace the table instead).

Typical use cases:

bash

# Migrate from another warehouse (BigQuery / Snowflake unload → Parquet)
rq push --table analytics.events --file bq_export.parquet

# Bulk-load a pandas dataframe
# (df.to_parquet("out.parquet"))
rq push --table raw.scored --file out.parquet

# Replace a staging table
rq push --table staging.orders --file latest.parquet --mode overwrite

# Migrate from another warehouse (BigQuery / Snowflake unload → Parquet)
rq push --table analytics.events --file bq_export.parquet

# Bulk-load a pandas dataframe
# (df.to_parquet("out.parquet"))
rq push --table raw.scored --file out.parquet

# Replace a staging table
rq push --table staging.orders --file latest.parquet --mode overwrite

Stdin

Pipe data from any command (text formats only). Specify --input-format when reading from stdin.

bash

# Pipe from curl
curl -s https://api.example.com/users | rq push --table raw.users --input-format json

# Pipe from a script
python generate_data.py | rq push --table raw.generated --input-format jsonl

# Pipe CSV
cat export.csv | rq push --table imports.q4 --input-format csv

# Pipe from curl
curl -s https://api.example.com/users | rq push --table raw.users --input-format json

# Pipe from a script
python generate_data.py | rq push --table raw.generated --input-format jsonl

# Pipe CSV
cat export.csv | rq push --table imports.q4 --input-format csv

Write Modes

Mode	Behaviour
`append`	Add rows to the table (default)
`overwrite`	Replace the entire table contents

Schema Inference

For JSON / JSONL / CSV, types are inferred from the data:

Integers and floats detected from values
Mixed int + float fields promote to float
Mixed types fall back to string
Nested objects stored as JSON strings
Null values are handled gracefully

Text formats are auto-chunked into 5,000-record batches for upload. Parquet is streamed row-group by row-group; no batching is needed on the client side.

Flags

Flag	Description
`--table`	Target table (schema.name format, required)
`--file`	Path to data file (omit for stdin; required for Parquet)
`--pk`	Primary key column for deduplication (text formats only)
`--input-format`	json, jsonl, csv, parquet (auto-detected from extension)
`--mode`	append (default) or overwrite