March 16 - 17, 2027 | Javits Center, New York

Decode the Market. 
Build the Future.
Capture the Alpha.

From Data Ownership to Data Intelligence: The Future of Proprietary Data

Is proprietary data still a genuine edge - or has the market caught up?

FA: In your view, does truly proprietary data still exist in systematic investing - and if so, is it still a defensible competitive advantage, or is the market closing the gap faster than funds can innovate?

Paul: I think it comes down to how you define proprietary data. In the early days of quant businesses, creating a proprietary data advantage may have simply been the ability to reliably store historical vendor/exchange data for later analysis. Nowadays there are vendors providing raw data processing as a service so it has no real edge. When we think of proprietary data now we view it as a set of truly unique features that we can create from the raw historical data. These new features are created by data scientists using AI agentic workflows allowing them to create a myriad of new measures which can then be used as inputs for our quant research teams when developing their strategies that also rely on a different set of complex AI/ML models. So the arms race now is how quickly and reliably can you develop these new proprietary data from any new alternative data set that becomes available.

The real value of data: is it in what you own, or how you use it?

FA: Has the edge in data shifted from sourcing and ownership to processing, interpretation, and application? And if so, what does that mean for how funds should be thinking about their data strategy going forward?

Paul: Sourcing new data is always valuable and can provide some edge but the world has woken up to the value of data so it is more likely that if you come across a new interesting data set then so have many others hence the real edge shifts further up the knowledge stack. Value within the new knowledge stack starts after the data are onboarded, documented, Data Quality (DQ) assessed and ID linked to all your other data sources, that's now table stakes that vendors can oversee. The real value comes when your data science team looks at the data set, docs and DQ results and comes up with unique ideas on how these new data could be useful for predictive analytics. At this moment they can then leverage the incredible power of agentic AI and ML and start developing truly proprietary derived data that are not readily available to other subscribers to the raw data. This ability to interpret where the value might come from and turn that into actionable insights via novel features is where value comes from.

As alternative data becomes commoditised, where does the next frontier of data advantage lie?

FA: As datasets that were once proprietary become widely available, where do you see the next genuine source of data-driven alpha - and how far away are we from that becoming commoditised too?

Paul: If we go back to my previous point on the evolution of proprietary data moving from early days of capture and storing through to today's complex transformations into unique derived data then I could argue we will never reach full commoditisation. By that I mean while the raw versions of the alternative data are well understood and commoditized, there are now endless possibilities for developing new and unseen derived data given the ever improving generative AI tools and ML techniques. One counter of course is once we hit AGI then maybe everyone has access to every possible idea and we all end up with our AIs giving us the same 'perfect data' but I'm not sure that's where we'll get to, especially given we are still finding unique data based on well known market data sets that have been available for 30+ years.