Autonomy Without Accountability Is Just Expensive Chaos

Why the control plane — not the agent — is the real product

I’ve been wrangling with something for months now, and I think I’ve finally got clarity on it.

I’m a Director of Engineering at n8n, a workflow automation platform operating at the centre of AI-native infrastructure. Before that, I co-founded Sailhouse with Ed Stephinson: event-driven infrastructure for production AI workloads.

Two very different vantage points. And I’ve come to the same conclusion.

The industry is deep in a debate about agents vs workflows vs skills, and I’ve been turning each of these primitives over from both angles. What’s an agent without a workflow? What’s a skill without governance? What are all the combinations, and which ones actually hold up in production? Every time I pull at the thread, I land in the same place: the interesting problem isn’t the agent. It’s everything around it.

The debate is the wrong frame

Anthropic’s foundational guide draws the line clearly: workflows orchestrate LLMs through predefined code paths; agents dynamically direct their own processes and tool usage. The difference is the degree of autonomy, and it’s a dial, not a toggle.

That framing is useful, but it can mislead you into thinking you need to choose. In practice, every serious production system I’ve seen is both. The agent handles the bits that need flexibility and reasoning. The workflow provides the state, the sequencing, the policy enforcement, and the “what happens when this goes sideways” logic. Anthropic’s 2026 agentic coding report predicts this is the year single-agent systems give way to coordinated multi-agent architectures; coordination is orchestration. It is a workflow.

Then throw skills into the mix. Anthropic launched Agent Skills as an open standard in late 2025: composable packages of instructions, scripts, and resources that agents load dynamically. MCP handles “how to connect.” Skills handle “what to do.” But neither answers the questions that actually matter when something’s running in anger: who’s allowed to run this? What can it access? How much can it spend? What happens when it fucks up?

Those are control plane questions. And they don’t just matter for enterprises with compliance teams and procurement cycles. They matter for anyone shipping an agent that does real work. If you’re a solo developer and your agent racks up a $400 bill overnight because it got stuck in a retry loop, you wanted spend governance. If it quietly accessed a data source you didn’t intend, you wanted access policy. If you can’t figure out why it made a weird decision, you wanted traceability. The scale is different; the needs are identical.

What happens when you skip it

In January 2026, an open-source AI agent project called OpenClaw went viral. It wasn’t an enterprise product; it was a hobbyist project that caught a wave. But within three weeks it hit a critical RCE vulnerability, a supply chain poisoning campaign that compromised roughly 20% of its skills marketplace, and tens of thousands of internet-exposed instances running without authentication.

I’m not dunking on OpenClaw. The pattern is what matters. Microsoft’s security team described the core issue: the runtime could ingest untrusted text, download and execute skills from external sources, and act using whatever credentials were assigned to it. The execution boundary shifted from static application code to dynamically supplied content, without equivalent controls around identity, input handling, or privilege scoping.

That’s not an OpenClaw problem. That’s what happens to any system that optimises for autonomy before building accountability. Gartner predicts over 40% of agentic AI projects will be cancelled by end of 2027 due to escalating costs, unclear value, or inadequate risk controls. They reckon only about 130 of the thousands of agentic AI vendors are real; the rest is agent-washing. And by 2030, they predict half of agent deployment failures will trace to insufficient governance runtime enforcement.

Every frontier lab is building the same thing

They just don’t all call it a control plane yet.

OpenAI’s AgentKit shipped with a visual workflow canvas, a Connector Registry for centralised data governance, and modular guardrails. Their Agents SDK includes tracing, parallel input validation, human-in-the-loop, and persistent sessions. Google integrated Cloud API Registry into Vertex AI Agent Builder so admins can centrally manage which tools agents touch. Futurum noted that Google’s ADK is positioning for the full agent lifecycle, from intent to outcome, which is exactly the kind of governance enterprise will demand as agents move to autonomous execution.

Meanwhile, the independent control plane ecosystem is exploding. Galileo released Agent Control recently as an open-source control plane, arguing the number one blocker for enterprise agents is no longer models but governance. OneTrust expanded AI governance to real-time monitoring and enforcement across agents. Redpanda launched an Agentic Data Plane with centralised identity, policy enforcement, and observability.

Here’s the thing I keep coming back to, though: the labs building their own governance layers doesn’t eliminate the need for independent control plane infrastructure. It validates it. An enterprise running agents across Anthropic, OpenAI, and Google doesn’t want three governance silos mirroring their model silos. They want one control plane that works across all of them. An analyst covering OpenAI’s stateful AI launch said it plainly: the strategic question isn’t which model is smartest; it’s which runtime stack guarantees continuity, auditability, and operational resilience at scale.

That’s interoperability. That’s where the real infrastructure gap lives.

What a control plane actually needs to do

“Add governance” is about as useful as “add security.” It means nothing until you get specific:

Identity and access control. Not just “who triggered this” but “what can this agent access, with what permissions, scoped to what data?” Every tool invocation through policy. When agents call other agents, the blast radius of a compromised credential scales with the authority you’ve granted.

Spend governance. Agents consume tokens in loops; multi-agent systems do it exponentially. Without per-run cost caps and real-time spend visibility, someone’s having a bad quarter. Estimates suggest multi-agent systems use 15x more tokens than simple chat interactions. That’s true whether you’re an enterprise or a solo dev with a side project.

Step-level policy enforcement. Galileo’s framing nails this: gateway solutions that filter input and output are blind to everything in between. Which tool the agent selected, what arguments it passed, what the database returned. You need enforcement inside the workflow, not wrapped around it; and it needs to be updatable centrally without redeploying every agent.

Observability as a first-class concern. David Poll has made a related argument about AI-generated code: human review can’t scale to match the volume; the real validation happens in production, through observability. The same logic applies to agents, only more so. You can’t human-review every decision an autonomous agent makes. You can instrument it so you know what it did, why, and what it cost. Honeycomb’s framing for 2026 puts it well: AI systems are nondeterministic by design, unfold over time rather than in single transactions, and generate real cost with every interaction. That’s not a monitoring problem you solve with dashboards after the fact.

Audit trails and compliance. If your agent makes a decision affecting a customer, a transaction, or a regulated process, you need to explain why. Deterministic replay. Complete traces. Clear data jurisdiction boundaries. The EU AI Act enters full enforcement this year.

Human-in-the-loop as architecture. The best framing I’ve encountered for what “winning” looks like in the agent era: shrinking human effort to being given the right information at the right time to make the right judgement call. That’s not removing humans. That’s designing the system so human attention goes exactly where it creates the most value.

The agent is not the product

I’ve written before about the experience debt problem: we’re building AI capability faster than we’re building the human judgment needed to govern it. The control plane is the structural answer.

Whether I’m looking at this from the workflow side or the infrastructure side, I keep arriving at the same place. The agent is not the product. The agent is a capability. The product is the system that makes it trustworthy, observable, and accountable.

Shipping great agents matters. But right now, the infrastructure to make them trustworthy, observable, and accountable at scale barely exists. That’s the gap. The agent market is crowded; the control plane market is wide open. And the teams that close that gap will end up as foundational to the AI stack as cloud providers became to the web.

Autonomy without accountability is just expensive chaos. Build the control plane first.