Push Data
Push JSON, JSONL, CSV, or Parquet into Iceberg tables from files or stdin. No connector setup, no scheduling - just pipe data in.
Quick Start
bash
# Push a JSON filerq push --table crm.contacts --file data.json
# With a primary key (for deduplication on subsequent pushes)rq push --table crm.contacts --file data.json --pk id
# Replace the table entirelyrq push --table crm.contacts --file data.json --mode overwrite
# Push a Parquet file (up to 2 GB, streaming)rq push --table raw.migration --file dump.parquet# Push a JSON filerq push --table crm.contacts --file data.json
# With a primary key (for deduplication on subsequent pushes)rq push --table crm.contacts --file data.json --pk id
# Replace the table entirelyrq push --table crm.contacts --file data.json --mode overwrite
# Push a Parquet file (up to 2 GB, streaming)rq push --table raw.migration --file dump.parquetSupported Formats
Format is auto-detected from the file extension. Override with --input-format.
bash
# JSON array (auto-detected from .json)rq push --table raw.users --file users.json
# JSON lines - one object per line (auto-detected from .jsonl)rq push --table raw.events --file events.jsonl
# CSV with headers (auto-detected from .csv)rq push --table raw.products --file products.csv
# Parquet (auto-detected from .parquet) — typed, streaming, up to 2 GBrq push --table raw.orders --file orders.parquet# JSON array (auto-detected from .json)rq push --table raw.users --file users.json
# JSON lines - one object per line (auto-detected from .jsonl)rq push --table raw.events --file events.jsonl
# CSV with headers (auto-detected from .csv)rq push --table raw.products --file products.csv
# Parquet (auto-detected from .parquet) — typed, streaming, up to 2 GBrq push --table raw.orders --file orders.parquetParquet
Parquet takes a dedicated streaming upload path. The file is sent as-is — no JSON serialisation, no 100k-rows-per-request ceiling — and the server writes it out as one Iceberg snapshot containing N data files.
- Max size: 2 GB per push. Split larger files and push in sequence.
- Stdin not supported: Parquet requires
--file <path>. - Type fidelity: column types are taken from the Parquet schema, not inferred.
- Schema evolution: new columns are added automatically on append; conflicting types are rejected (use
--mode overwriteto replace the table instead).
Typical use cases:
bash
# Migrate from another warehouse (BigQuery / Snowflake unload → Parquet)rq push --table analytics.events --file bq_export.parquet
# Bulk-load a pandas dataframe# (df.to_parquet("out.parquet"))rq push --table raw.scored --file out.parquet
# Replace a staging tablerq push --table staging.orders --file latest.parquet --mode overwrite# Migrate from another warehouse (BigQuery / Snowflake unload → Parquet)rq push --table analytics.events --file bq_export.parquet
# Bulk-load a pandas dataframe# (df.to_parquet("out.parquet"))rq push --table raw.scored --file out.parquet
# Replace a staging tablerq push --table staging.orders --file latest.parquet --mode overwriteStdin
Pipe data from any command (text formats only). Specify --input-format when reading from stdin.
bash
# Pipe from curlcurl -s https://api.example.com/users | rq push --table raw.users --input-format json
# Pipe from a scriptpython generate_data.py | rq push --table raw.generated --input-format jsonl
# Pipe CSVcat export.csv | rq push --table imports.q4 --input-format csv# Pipe from curlcurl -s https://api.example.com/users | rq push --table raw.users --input-format json
# Pipe from a scriptpython generate_data.py | rq push --table raw.generated --input-format jsonl
# Pipe CSVcat export.csv | rq push --table imports.q4 --input-format csvWrite Modes
| Mode | Behaviour |
|---|---|
append | Add rows to the table (default) |
overwrite | Replace the entire table contents |
Schema Inference
For JSON / JSONL / CSV, types are inferred from the data:
- Integers and floats detected from values
- Mixed int + float fields promote to float
- Mixed types fall back to string
- Nested objects stored as JSON strings
- Null values are handled gracefully
Text formats are auto-chunked into 5,000-record batches for upload. Parquet is streamed row-group by row-group; no batching is needed on the client side.
Flags
| Flag | Description |
|---|---|
--table | Target table (schema.name format, required) |
--file | Path to data file (omit for stdin; required for Parquet) |
--pk | Primary key column for deduplication (text formats only) |
--input-format | json, jsonl, csv, parquet (auto-detected from extension) |
--mode | append (default) or overwrite |