No credit cardStart free
·7 min read·Editorial

Advocating for the Death of The Modern A16z Stack.

A sarcastically written ode for the death of manufactured complexity that nobody wanted but everybody got anyway, thanks to VCs and startups advertising complexity as a product.

A couple of years ago, Jordan Tigani - CEO of MotherDuck - wrote a great article called “Big Data is Dead”, and much more recently, its much milder-toned follow-up “Future Casting the Modern Data Stack”, both great reads with amazing insights. I am nowhere near capable of producing anything remotely as well-articulated; you are better off just reading them.

What I can do however, after a few years in the field, is advocate for the death of the bloated modern data stack. Out of spite, out of jealousy, but also, because it does deserve to die. Much like the tulip mania of the 1600s: lots of money for a very few people while everyone else stands on the sideline wondering why it matters in the first place.

The Bad Kind of Feedback Loop

The MDS field is a negative feedback loop - engineers leave their companies, then go start their own, convinced they just reinvented hot water. They create a complex and marginally useful tool. Said tool has tangible benefits for 10 large corporations worldwide; they end up implementing it. This becomes a blueprint for everyone else. Us peasants end up with a new complex layer to get one Postgres and four CSVs into a dashboard.

Engineers are meant to and want to do engineering, and when left alone to their devices, end up doing just that. Sometimes it's good, sometimes it's not - life be like that sometimes. Now when engineers meet consultants-turned-VC people, after many slides, a sudden and irreversible descent into complexity happens: the market ends up with a new creature in the zoo, and your manager now tells you you need to manage a new tool, which is going to solve everything that, honestly, only needed trivial solving in the first place.

10 years later - the fabled “Emerging Architectures for Modern Data Infrastructure” pops into existence. When really, we just needed a database that we could slap some SQL and charts on top of. Now somehow we got forward-deployed engineers, architecture meeting reviews, and infrastructure teams infrastructuring things they do not know for what exact purpose.

The cargo cult of modern data infrastructure is convincing everyone they need the same infrastructure depth as Spotify and Netflix. While it is unclear if Netflix and Spotify need it in the first place either.

You're a Hoarder, Harry.

As Tigani noted, data access drops exponentially with age; most queries hit data from the last few days or weeks at most. Older data is, effectively, just hoarded for the benefit of supporting the architecture. But don't listen to me, listen to the teams everyone wants to mimic; for example, two years ago in an article on Netflix about the amount of hoarding they do with their media and production assets, they admit in passing: “at the same time, internal research shows that at least 40% of data never gets used.” Not even analytical data - actual files, proxies, and production content. If they can't keep their own house in order, what hope does your 50-person company have?

And evidently, this is likely your experience as a user as well. Even if you do not work in analytics, last year's financials are now a simple bar chart in the annual presentation. Nobody besides the auditing firm employed to do just that will go into the depth of January 31st's invoices. Your boss does not care either; this has no predictive value for this or next month's KPI. This IS history.

“Think about all the models we could build with that historical data.” Says the data scientist at the back of the room. Yes, but you do not do that either. For most companies, LTV and churn prediction models produce a single number that rarely changes anyone's behaviour. Nice to have, very easy to dismiss.

“Think about our ability to predict a product's stock!” - We were pretty good at predicting the nature of supply and demand before machine learning and data warehouses. Fairly sure we are still fairly OK at it, and the 5% marginal gain on stock management for a Fortune 500 is probably not top of mind at MarketingSocialMediaCompany Inc.

So yeah. Data gets stale, fast.

What it tells us is: are we doing good compared to the last period, and is this a trend towards the next period. That's about it. The rest is - for a large portion of the world - wankery.

Everyone and Their Mother Can Make Dashboards Now.

Mine sure can, at 60+ years old. She did what every business analyst below the age of 30 already does, and will continue doing for the next decades. She took that xlsx or CSV, threw it into ChatGPT or Claude (she does not use Claude, she is my mom, she is not THAT cool), and asked what the data is saying that she can comprehend, and asked questions in natural language while getting fundamentally acceptable answers within an acceptable deviation.

IQ bell curve meme: both ends say 'just throw it in chatgpt' while the middle insists on building ETL pipelines and dbt models

The fact that it might be spinning up a gVisor environment with an ephemeral filesystem to parse data in some Python library... or using voodoo magic; it's all the same to her. It's the same to the junior business analyst, and, so long as it doesn't spill nonsense, it's the same to the stakeholder at the top of the management matrix who has to make a decision based on the data provided to them.

This is not to diminish either the purpose or ability to produce qualitative analysis from anyone; in fact, I am very purposefully taking the stand of minimising the impact and purpose to make a point: it need not be as complicated as it once needed to be. The tools are easier, but also, we have learned we do not need much from our data; it is a simple representation of the current reality of something; a business consideration or your mom's finances.

Some organisations are expected to need more, perhaps: the governmental agencies in charge of your taxes, the military, the company that provides video and audio media content to half of the western hemisphere. But most, really, just need a few metrics provided to them in a consistent and simple form.

Failing Upward Into Obsolescence

Now the biggest architects of the very impressive era of technology we are living in are the ones that built that complexity in the first place. There needed to be Hadoop for the data lake to exist. There needed to be the entire distributed analytics era for DuckDB to emerge and tell everyone they never needed it. There needed to be LLMs to realise you don't need that much processing to get insights. It's an ongoing process.

The current issue with that process is that most organisations responsible for providing those technologies are in a literal quagmire; their investors and their employees are sold on the idea that there is an infinite amount of growth to get from an infinite amount of data that will be generated. Nobody can price themselves down in the modern data stack. But truth be told, a company like Databricks acquiring Neon, a serverless Postgres provider, is an admission of failure to a degree, it says “maybe people did not need that much of the other things we were selling.” They are trying to escape their own stack.

A New Modern Modern Data Stack

Here is the kicker - this story has been told before. Oracle told everyone that a database was a six-figure requirement. MySQL and Postgres entered the chat. Hadoop convinced the world everything was a distributed system; Spark came. The MDS sold us on complex DAGs, connectors, and data models. Then Fivetran merged with dbt - “merged” being the polite word for when the CEO chair goes to one side - in the hope that one of them would survive - and their press release promptly announced they'd be building “open data infrastructure” that “unifies data movement, transformation, metadata, and activation.” Four nouns to say “pipe and query.”

This does not mean nobody will ever need a database, or a query engine, or a lakehouse, or an orchestrating tool, or Looker, but that the path from “data is produced” to “data is consumed” is getting much cheaper, much simpler, and much more accessible to a wider audience.

So here is my modest-ish contribution to the debate: the new modern stack is storage, some query, and an LLM on top. Pick whichever flavour you want; even the LLM will eventually get commodified. You, however, will still need to show what the conversion rate or CTR or stock level is. Businesses are made of real things that people buy and consume.