Leave Anytime & Take Everything
Let's imagine you try leaving [insert big data company here]. You can likely export to CSV. Maybe Parquet? If you know where to look anyways and if you survive the 10 team meetings about it. Eventually the table metadata, the history, the schema, that stays behind with your sanity. What you get, is a pile of files and a 6 month migration project.
With us - your data is already in an open format
rawquery stores everything as Apache Iceberg tables. Not some obscure proprietary format with Iceberg metadata bolted on. Actual Iceberg tables. Parquet data files, manifest lists the whole table metadata lot. The same format that *allegedly* Spark, Trino, Snowflake, and Databricks all read natively.
The data is already in the format everyone supports. When you want to leave, run rq export and you get presigned download URLs for every Parquet data file, the Iceberg metadata, and all manifests. Download them, register the tables in your catalog, done.
We did this to ourselves. For selfish reasons.
Lock-in lets you charge more, at the cost of product quality and credibility. The switching cost and the stickiness is some sacro-saint moat that can be justified to stakeholders. Every data company and their VC knows this; the harder it is to leave, the less you have to compete on product quality, plus you can always throw in some consulting hours for everything that your customer complain that your product does not do. Ideally, our cynical arses want to compete on product quality; not on our ability to make life miserable if you want to leave. That's called emotional abuse and we are not up for it.
What this looks like in practice;
# List your Iceberg tablesrq tables
# Export one table — get presigned URLs for all filesrq export stripe.customers
# Or download everything locallyrq export stripe.customers --download ./export/stripe/customers/
# Export your entire workspacerq export --all --download ./full-export/# List your Iceberg tablesrq tables
# Export one table — get presigned URLs for all filesrq export stripe.customers
# Or download everything locallyrq export stripe.customers --download ./export/stripe/customers/
# Export your entire workspacerq export --all --download ./full-export/Time travel works. Schema history is preserved. Partition metadata is there. Everything that makes Iceberg useful is part of the format, not part of our product. The product is the ingestion, the compute, the query engine. The data is yours in an open format that predates us and (while we hope not but still very likely) will outlive us.
We bet that this is also gonna be solving a few “hey, but what if they disappear next month” conversation with the team. Oh and extra bonus? That's a quite fancy backup pipeline you got running here.
Raw Takes Short reads on things we built and why they matter (to us). No thought leadership.