Most AI harnesses start as a prompt and a wrapper. They get to v1.0 by accumulating branches in the wrapper. AI Harness took the opposite path: codify governance as typed artifacts, then make the wrapper as small as possible.
v0.6.0 is the first release where that bet looks proven.
If you’ve been following the Harness as Code thesis, this is the release where the runtime catches up to the philosophy.
What v0.6.0 actually changes
Four things matter in this release. Everything else is supporting work.
1. Typed artifact bundles are real
Shape A bundles — .harness/{plugins,builtins,overrides}/*.md — are now
first-class. The bundle loader (PR #123) closed the gap where the artifact
registry already understood harness_artifact/v1alpha1 declarations but
serve and validate quietly ignored them.
One file = one capability bundle. Tools, hooks, and prompts that belong to the same governance unit live in the same artifact.
2. The agent loop is hardened
A strict finish_reason guard now sits at the top of the loop (PRs #121, #123):
finish_reason | Behavior |
|---|---|
stop, end_turn, "" | Fall through to a final answer |
length | Retriable error — context truncated |
content_filter | Hard error — no silent recovery |
| anything else, no tool calls | Retriable error — no silent stop |
No more “agent quietly stopped on turn 14 and we don’t know why.”
3. Reference docs are complete
Every public surface has an exhaustive reference page now:
harness.mdfrontmatter — every field, every default, everyvalidate()check- Tool artifact schema — file shape, parameters, Starlark dialect,
asyncreserved - Hook artifact schema — full event catalog, payload shapes, decision contract,
when:semantics - Starlark built-ins — every builtin from
scripting.Engine.makeBuiltins, per-module - CLI — every subcommand, flag, env var, exit code
No more “read the source.” The docs are now the contract.
4. The live bot is governed
@htekdevaiharness on Telegram runs the
same Shape A bundles you’d ship to your own team:
$ harness validate -v
21 tools registered (across harness.md + 2 plugin bundles)
5 hooks registered
That count comes from a notes-bundle (note save/list + audit hook) and a
safety-bundle (command guard + output redactor + status tool) both loaded
as typed artifacts. The same loader. The same precedence rules. The same
docs you’d read.
Why typed artifact bundles matter
This is the conceptual centerpiece, and it’s where AI Harness takes the strongest position against everything else in the category.
Most “extension” systems give you one file per capability and pretend that’s the answer. The reality: a real capability is rarely one tool. It’s a tool plus a hook plus a guard plus a default prompt fragment. Splitting those across four files breaks composability — you can no longer move “the safety capability” between repos as one diff.
Shape A bundles fix that. Each .md file declares a single capability bundle:
---
artifact: harness_artifact/v1alpha1
kind: plugin # plugin | builtin | override
name: safety-bundle
priority: 40
---
# Safety bundle
Tools, hooks, and prompts that govern destructive operations.
## Tool: command_guard
...
## Hook: tool.pre / output_redactor
...
Composition is deterministic. Precedence is declared at the kind level:
override > harness > builtin > plugin > model
Per-turn evaluation re-checks each artifact’s when: predicate every turn,
not just at startup. An artifact that’s inactive on turn 3 can light up on
turn 4 without restarting the agent.
This is the line that separates “extensions” from Harness as Code: the unit of governance is the bundle, not the individual file. You can review one diff. You can move one folder. You can audit one artifact. The runtime composes them deterministically.
Things you can actually inspect now
Three commands that didn’t quite work two releases ago and now are the daily-driver:
harness validate -v
Registers every artifact, runs every parser, prints a per-bundle tool/hook
count. On the live bot today: 21 tools / 5 hooks across harness.md +
two plugin bundles. If the number doesn’t match what you expect, your
bundle isn’t loading. That’s the loop.
harness context --verbose
Shows what the agent saw on a given turn:
- which chunks were assembled into the system prompt
- where each chunk came from (which artifact, which file)
- which artifacts were active vs inactive
- which
when:predicates passed - total token spend, broken down by source
Context observability is not an afterthought. It is shipped.
harness artifacts
Flat list of every loaded artifact with its priority, kind, source file, and active/inactive state. Useful when you need to answer “is this hook actually firing?” without grepping through bundles.
What’s still off the menu
Honesty matters. v0.6.0 is not a “we figured it all out” release.
- Compaction engine vs hooks — open question (#69 / roadmap). The leading candidate is hooks-driven compaction in v0.7.
- Memory persistence — flat-files today; SQLite is on the table for v0.7.
- Sub-agent supervision — primitive level, not orchestration level. Phase 7 territory.
- Async tool calls —
async:is reserved in the tool schema (parsed but not propagated throughToolConfig). Wired in Phase 3. agent.stophook event — the strictfinish_reasonguard ships in v0.6.0, but the proper hook primitive (issue #104) is held for v0.7.0 so it can get its own design pass.
If you need any of those today, you’re early. That’s fine. The core’s shape is what we’re committing to in v0.6.0; the edges are still moving.
The pre-1.0 schema-evolution clause stays in effect: artifact frontmatter fields can still change between minor releases. The CHANGELOG calls every break out explicitly.
How to try it
go install github.com/htekdev/ai-harness/cmd/harness@latest
harness init my-agent
cd my-agent
harness validate -v
harness serve --source stdin
Then drop a Shape A bundle into .harness/plugins/:
---
artifact: harness_artifact/v1alpha1
kind: plugin
name: my-first-bundle
priority: 50
---
## Tool: hello
Say hello and exit.
## Hook: tool.post / log-everything
Print every tool call to stderr.
Re-run harness validate -v. The tool/hook count should go up. That’s the
loop. That’s the whole product surface.
The bigger arc
- v0.4.0 was the first usable harness.
- v0.5.0 was the first one with proper claims verification (Ralph loop at the delegation boundary).
- v0.6.0 is the first one where the artifact model, the loop, and the docs all line up with the Harness-as-Code thesis.
That’s the milestone worth marking. v0.7 is async, memory persistence, and the compaction engine. After that, v1.0 is a positioning question, not an engineering one.
Where to go next
- Repo: github.com/htekdev/ai-harness
- Docs: htekdev.github.io/ai-harness
- Live bot: @htekdevaiharness on Telegram
- Companion piece: What Is Harness as Code?
- Category survey: Live comparison of agent harnesses
If you’ve been waiting for “the small one with real governance,” this is it.