The AI Risk Almost Nobody Is Looking At

And why this is the Newtonian-to-quantum shift for enterprise software.

May 09, 2026

I posted on LinkedIn yesterday about the assumption underneath most enterprise software: that we can model the world deterministically.

The conversation that followed mostly went to the AI risk we already know how to talk about. The probabilistic nature of LLMs. Governance, guardrails, audit trails.

That risk is real. The scaffolding is being built.

But the conversation missed the harder risk the article was about, and almost nobody is talking about it. It’s the one that’s going to bite us.

For decades, software lived in a Newtonian world. The world changed slowly enough that we could write rules and trust them to hold. Markets moved on quarterly cycles. Policy moved at the speed of board meetings. We knew December was busier than January. We knew what we were selling, when, how. We built the system to handle it.

Inside that world, the rules covered most of what happened. Stock ran low, the system reordered. Stock-out at the supplier, you waited. If something fell outside the rules, a person stepped in. Renegotiated. Told the customer their order would take longer. Managed the exception. We built AI on the same premise: how to automate more of what the person did.

In this world, test cases were stable. Bugs didn’t ripple through the wider ecosystem. The system you tested was mostly the system you ran. Build it for this year’s forecasts, upgrade it next year based on what you learned and the board’s new targets.

For all its effort and complexity, current AI governance still lives in this Newtonian frame. The model or the agent is the unit. We define its rules and its guardrails. We test against them. We trace its reasoning. The trace goes green. Everything’s good.

This is what surfaces as Risk 1. The probabilistic LLM, the variance to govern, the audit trails and evidence standards. Most of the AI safety conversation lives here. Most of the regulation lives here. Most of the consultancy practice lives here.

That works while the world holds still long enough.

But the world isn’t holding still anymore. It’s speeding up, some industries faster than others, and agentic AI is accelerating it further.

An agent can negotiate a contract in minutes. A market cycle that took a quarter only because we humans couldn’t move any quicker now takes an hour. Where does that leave quarterly, monthly, or weekly governance? It can’t keep up.

Speed is the easier bit. Governance can be automated.

The harder part is what happens when my agent meets your agent. On the surface that’s a technology problem. How do agents from different companies work together? Trust, identity, data standards, API specs. All valid, all being worked on.

The deeper problem is systemic impact.

I deploy an agent to optimise delivery for my customers. You deploy an agent to maximise profit for your business. Both behave correctly in isolation. Both pass their tests. Both vendors say green. But the moment they start negotiating with each other, what actually happens? My agent is written by one vendor. Yours by another. Each says it’s working as designed. Together they produce something neither of us designed and neither of us can predict from inside our own systems.

That’s the world we’re moving into. The Newtonian assumption broke. The act of participation changes the system. Isolation is a fiction.

This is Risk 2. The emergent system-level behaviour that comes from lots of agents acting “correctly” together. It sits between agents, between systems, between organisations. Nobody designs it. Nobody can predict it from the agent’s individual trace.

The trace reads green. The world doesn’t.

Over hundreds of scenarios since December, with agents behaving as designed, I’ve watched the system collapse anyway. I’ve been running these in CONSTELLATION, the agentic AI development and evaluation infrastructure I built. The wider research community is now starting to evidence the same.

Here’s the structural reason Risk 2 stays invisible. It lives between enterprises. In the market. In the ecosystem of agents from different vendors, deployed by different companies, optimising for different objectives, that all happen to share an environment.

Nobody owns the market.

That’s the whole problem in one sentence.

And vendors? Most aren’t thinking about it. The ones who are won’t say so. Acknowledging Risk 2 means admitting their tooling is flawed, and the next question writes itself: why didn’t you fix the underlying issue?

Raise it inside any enterprise and the response is just as reliable. We’re not the ones who would do that. Our agents are well-tested. We have governance. Look at our audit logs. It’s someone else’s problem.

But the wider system impact lands on you regardless of who set it off. If the market goes sideways because of how everyone’s agents behaved together, even when each one was working exactly as designed, the consequence is yours anyway. You’re a participant in the same ecosystem.

This is the deflection that lets Risk 2 stay invisible. Each individual actor can plausibly say “not me.” The system-level outcome happens regardless.

What we validate has to extend past the agent. A green trace is not a valid commit. The reasoning trace tells you the agent’s logic was coherent. It tells you nothing about whether the system is still healthy after the agent acted.

The unit of evaluation has to be the ecosystem. Did the system stay valid after the agent acted, given everything else acting at the same time?

That requires simulation. You have to run the system at scale, with representative agents in it, and watch what emerges. You have to stress-test the ecosystem before you stress-test it for real.

This is what most current agent tooling doesn’t do. The trace stops at the edge of the agent. The agent is reasoning beautifully. The world around it is unstable. Both can be true at once.

We haven’t had the GameStop moment for AI agents yet. The market hasn’t seen the event where multiple agents, behaving correctly in isolation, produce a system-level outcome nobody saw coming. We’ve come close in trading systems. The 2010 flash crash was a Newtonian-era preview. The actors then were algorithms with bounded objectives.

The next one will involve agents with much more open-ended objectives, deployed at scale, in environments where the rules of engagement are still being written.

By the time we have the moment, the question of “who owns this risk” is going to be loud and very late.

The window to think about it well is now, while the risk is still abstract enough to be ignored.

You read this far. That tells me something.

You’re either deploying agents you can’t fully model, or you’re advising someone who is. Either way, Risk 2 is on your radar now whether or not you have a name for it.

I’ve been deploying ML and AI into complex operational workflows since 2017, well before the current LLM wave, including in regulated sectors. CONSTELLATION came out of that work. Agentic AI development and evaluation infrastructure built around the ecosystem as the unit of analysis. It models exactly what current tooling misses.

Three ways to take this further:

Subscribe. Weekly writing on AI, enterprise, agents, and what AI frees us to become.

Book a 30-minute call. Tell me where you’re deploying agents. I’ll tell you where the systemic exposure sits. No pitch attached.

Work with me directly. We model your specific ecosystem in CONSTELLATION, surface the systemic risk before deployment, and put governance in place that catches what current tooling misses.

The GameStop moment is coming. Whether you’ve thought about your ecosystem before it happens is the only thing still in your control.

Mark Strefford

Discussion about this post

Ready for more?