An early survey of LLM harnesses

2026-05-09

Recently, buzz around LLMs has shifted from the models themselves (intelligence) to the harnesses (scaffolding). The "harness" is the tools, runtime, and instructions that turn an LLM into an agent. Things like "skills" and "memory" are all provided by the harness, sent as text into the model. For a while the engineering community was focused on "prompt engineering", then "agentic engineering", but in the past few months, it's turned almost entirely to "harness engineering".

There were earlier implementations of agentic loops, but Claude Code (February 2025) was the first popular tool to get people thinking about harnesses (though that word wasn't yet in use). It demonstrated the utility of frontier models combined with the right scaffolding.¹1 System prompts are tuned to specific models. Someday they might be more generalizable, but currently OpenAI and Anthropic spend tons of effort refining their harness's system prompts. OpenAI's Codex (May 2025) and open source harnesses like Opencode (June 2025) and Pi (November 2025)²2 Mario Eichner's post introducing Pi is essential reading for anyone interested in harness engineering were quick to follow, but Claude Code had first-mover advantage and for a while it was a better product: both in results and developer experience. The idea - frontier models running in a loop, calling local tools, using a finely-tuned system prompt - proved very effective.

In December 2025, OpenClaw (originally "Clawdbot") brought agents to the masses: what happens if instead of limiting to coding tools you give the model access to everything: your email, your contacts, your data... your computer? It was a security nightmare but showed the potential of personal agents. The project took off like wildfire, and Mac Minis remain difficult to purchase five months later: everybody wants to run their own agent against a local LLM.³3 I find it interesting that people accept OpenClaw's lax attitude toward security while simultaneously having paranoia over using OpenAI / Anthropic.

In March 2026, Anthropic's actions indicated that they view their harness as a potential moat. Coinciding with hyperscaling challenges, they began to lock it down. They blocked third-party use of their subscription plans, and continued to take an aggressive stance against open source harnesses. This, alongside some harness bugs causing degraded performance and opaque billing changes, burned a lot of developer goodwill. OpenAI capitalized on the crisis and many developers jumped ship, realizing that Codex had caught up both as a harness and as a model.

As of May 2025, Codex is the superior harness. OpenAI has understood agentic engineering workflows better and built a great desktop coding app. But at any moment this could change: a new model could launch, local LLMs could become viable, OpenAI and Anthropic token pricing will change. For this reason, open harnesses, especially Pi (built on simplicity and context transparency), remain viable.

Today all the energy is about harnesses: everybody is writing pieces on harness engineering, there are now harness meta-frameworks... I've even realized my own project Teambook, is a harness. But the core idea remains the same: model + tools + instructions, running in a loop.

sllvn//

An early survey of LLM harnesses