Why trading firms are collapsing the divide between Databases and Data Lakes

When a trading firm adopts open formats like Parquet or the PostgreSQL protocol, what does that actually change about their day-to-day workflows, and what are the downsides?

Open formats change one fundamental thing: data stops being trapped in a single system. A research team can pull Parquet straight into pandas, Spark, or their data lake without re-ingestion. A trading dashboard can talk to the database over the Postgres wire protocol with whatever client the team already uses. Integration cost drops significantly, and vendor lock-in stops being a strategic risk. If you ever need to migrate, the data is already portable and the protocol is already standard.

For QuestDB users specifically, you can store hot data in QuestDB's native format, then tier it to Parquet in local storage, and from there to cheaper object storage on NFS or cloud. No separate pipeline, no second system to operate. Cheap and fast at the same time.

The Postgres wire protocol means most BI tools, every Python client, and most Postgres client libraries in any language work on day one. Weeks of integration become an afternoon.

A lot of quant shops run a time-series database and a data lake side by side, with data duplicated between them. Is that a problem people are actively trying to solve, or does the separation usually exist for good reasons?

The separation is a legacy artifact. Older time-series systems weren't built to handle both real-time ingestion and historical analytical queries on years of data, so teams ran two systems and built pipelines between them. The result was two storage bills, two operational footprints, drift between sources of truth, and ETL code that nobody wanted to own.

That gap is closing. In QuestDB, recent partitions sit in the native columnar format tuned for fast ingest and low-latency queries, and older partitions tier out to Parquet on object storage automatically. Same SQL reads across both, while third-party tools can directly access the same files, bypassing the QuestDB query engine entirely if that's what the workflow needs.

Iceberg is pushing the same convergence from the data lake side, and we integrate with it. The endpoint everyone is heading toward is the same: one logical view of your data, regardless of which tier it physically lives on.

There's a push to move calculations like post-trade analysis closer to where the data lives, rather than exporting and computing externally. Where does that approach work well, and where does it fall apart?

Post-trade analysis is the obvious case. Traders have been calculating markouts for decades, and the traditional workflow is painful: pull trades and order book data out of the database, move it into Python, run the calculation, push results back. It's slow, expensive, and easy to get wrong.

QuestDB does this with horizon joins. Markout curves for millions of trades against billions of order book updates per day, in seconds, in SQL. No data movement, no pipeline. Same story for slippage, OHLC bars, VWAP.

What actually makes a data tool easy for an LLM or coding agent to work with - is that mainly a documentation problem, or is there something more fundamental?

Both, and the foundation is standards. SQL, the Postgres wire protocol, REST. An LLM already knows SQL on day one, and it can use any existing Postgres client library to talk to QuestDB. Over REST, an agent can introspect the schema, run a query, read a structured error, and retry. That feedback loop is what makes coding agents actually work.

Where documentation does the heavy lifting is on top of that foundation. QuestDB is SQL plus extensions for time series and market data, so the docs are what teach the agent the extensions it doesn't already know, and just as importantly, the business-oriented use cases. Take the horizon joins we mentioned earlier: an agent isn't going to invent that pattern from first principles, but with a cookbook recipe showing how to compute markouts end to end, it can apply it to a new dataset cleanly. The QuestDB cookbook has over forty of these for financial data: moving averages, slippage, OHLC bars from ticks. Markdown format, consistent structure across pages, and the model has predictable scaffolding to work from.

What has to be true for a quant team to seriously consider changing a core part of their data stack?

Three things have to line up. The new system has to do something the current one can't, or do it dramatically better, not 2x but closer to 10x. There has to be a migration path that doesn't require a flag day, which usually means running side by side for months. And the trust signals have to be there: open source so the team can experiment before committing, and comparable firms running it in production.

Cost is rarely the deciding factor on its own, but it's often what starts the conversation. A team looking at the next renewal sees the bill and starts asking what else is out there. Once they're looking, the question shifts: does anything actually do the job better? A cheap system that doesn't do the job is worse than an expensive one that does, and teams that switch for cost alone usually regret it. The ones that don't look back are the ones who found a system that unlocked workflows the old one made impossible.

As databases take on more analytical work, where should the line be between what the database handles and what belongs in a separate analytics layer?

The principle is simple: anything that can be commoditized belongs in the database. Anything unique to your firm belongs outside.

If every trading firm is calculating markouts, slippage, OHLC bars, VWAP, candles, those calculations should be embedded in the database, fast and battle-tested. There's no good reason for each firm to reinvent these in Python on top of raw data exports. That's where horizon joins, materialized views, and time-series-native SQL extensions come in. We're a specialist database for time series and market data, and our job is to absorb the generic problems so our users don't have to solve them again.

What stays outside the database is what makes you you. Proprietary signals. Unique trading models. Custom backtests. Insights, whether they come from a human analyst, an AI agent, or a mix of both. That's your edge, and it doesn't belong as a feature of someone else's database. It also falls apart when the work needs things SQL was never meant to do: training a model, calling a GPU, running a numerical library that lives in Python or C++.

That said, the boundary keeps moving. We're adding a plugin system so customers can extend QuestDB with their own logic, similar to UDFs, which means even firm-specific calculations can run next to the data when it makes sense. The split between database and analytics layer isn't a fixed line, it's a moving one, and it's moving in the database's direction.

The right test: if every firm in your space is doing the same calculation, the database should be doing it for you. If you're the only one doing it, that's where your engineering team should focus.

Return to Blog

Decode the Market.
Build the Future.
Capture the Alpha.

Why trading firms are collapsing the divide between Databases and Data Lakes

Organized By:

QUESTIONS?

Decode the Market. Build the Future.Capture the Alpha.

Why trading firms are collapsing the divide between Databases and Data Lakes

Organized By:

QUESTIONS?

Decode the Market.
Build the Future.
Capture the Alpha.