AI Harness v0.6.0 — Harness as Code Gets Its Reference Implementation

Most AI harnesses start as a prompt and a wrapper. They get to v1.0 by accumulating branches in the wrapper. AI Harness took the opposite path: codify governance as typed artifacts, then make the wrapper as small as possible.

v0.6.0 is the first release where that bet looks proven.

If you’ve been following the Harness as Code thesis, this is the release where the runtime catches up to the philosophy.

What v0.6.0 actually changes

Four things matter in this release. Everything else is supporting work.

1. Typed artifact bundles are real

Shape A bundles — .harness/{plugins,builtins,overrides}/*.md — are now first-class. The bundle loader (PR #123) closed the gap where the artifact registry already understood harness_artifact/v1alpha1 declarations but serve and validate quietly ignored them.

One file = one capability bundle. Tools, hooks, and prompts that belong to the same governance unit live in the same artifact.

2. The agent loop is hardened

A strict finish_reason guard now sits at the top of the loop (PRs #121, #123):

`finish_reason`	Behavior
`stop`, `end_turn`, `""`	Fall through to a final answer
`length`	Retriable error — context truncated
`content_filter`	Hard error — no silent recovery
anything else, no tool calls	Retriable error — no silent stop

No more “agent quietly stopped on turn 14 and we don’t know why.”

3. Reference docs are complete

Every public surface has an exhaustive reference page now:

harness.md frontmatter — every field, every default, every validate() check
Tool artifact schema — file shape, parameters, Starlark dialect, async reserved
Hook artifact schema — full event catalog, payload shapes, decision contract, when: semantics
Starlark built-ins — every builtin from scripting.Engine.makeBuiltins, per-module
CLI — every subcommand, flag, env var, exit code

No more “read the source.” The docs are now the contract.

4. The live bot is governed

@htekdevaiharness on Telegram runs the same Shape A bundles you’d ship to your own team:

$ harness validate -v
21 tools registered (across harness.md + 2 plugin bundles)
5 hooks registered

That count comes from a notes-bundle (note save/list + audit hook) and a safety-bundle (command guard + output redactor + status tool) both loaded as typed artifacts. The same loader. The same precedence rules. The same docs you’d read.

Why typed artifact bundles matter

This is the conceptual centerpiece, and it’s where AI Harness takes the strongest position against everything else in the category.

Most “extension” systems give you one file per capability and pretend that’s the answer. The reality: a real capability is rarely one tool. It’s a tool plus a hook plus a guard plus a default prompt fragment. Splitting those across four files breaks composability — you can no longer move “the safety capability” between repos as one diff.

Shape A bundles fix that. Each .md file declares a single capability bundle:

---
artifact: harness_artifact/v1alpha1
kind: plugin            # plugin | builtin | override
name: safety-bundle
priority: 40
---

# Safety bundle

Tools, hooks, and prompts that govern destructive operations.

## Tool: command_guard
...

## Hook: tool.pre / output_redactor
...

Composition is deterministic. Precedence is declared at the kind level:

override > harness > builtin > plugin > model

Per-turn evaluation re-checks each artifact’s when: predicate every turn, not just at startup. An artifact that’s inactive on turn 3 can light up on turn 4 without restarting the agent.

This is the line that separates “extensions” from Harness as Code: the unit of governance is the bundle, not the individual file. You can review one diff. You can move one folder. You can audit one artifact. The runtime composes them deterministically.

Things you can actually inspect now

Three commands that didn’t quite work two releases ago and now are the daily-driver:

`harness validate -v`

Registers every artifact, runs every parser, prints a per-bundle tool/hook count. On the live bot today: 21 tools / 5 hooks across harness.md + two plugin bundles. If the number doesn’t match what you expect, your bundle isn’t loading. That’s the loop.

`harness context --verbose`

Shows what the agent saw on a given turn:

which chunks were assembled into the system prompt
where each chunk came from (which artifact, which file)
which artifacts were active vs inactive
which when: predicates passed
total token spend, broken down by source

Context observability is not an afterthought. It is shipped.

`harness artifacts`

Flat list of every loaded artifact with its priority, kind, source file, and active/inactive state. Useful when you need to answer “is this hook actually firing?” without grepping through bundles.

Honesty matters. v0.6.0 is not a “we figured it all out” release.

Compaction engine vs hooks — open question (#69 / roadmap). The leading candidate is hooks-driven compaction in v0.7.
Memory persistence — flat-files today; SQLite is on the table for v0.7.
Sub-agent supervision — primitive level, not orchestration level. Phase 7 territory.
Async tool calls — async: is reserved in the tool schema (parsed but not propagated through ToolConfig). Wired in Phase 3.
agent.stop hook event — the strict finish_reason guard ships in v0.6.0, but the proper hook primitive (issue #104) is held for v0.7.0 so it can get its own design pass.

If you need any of those today, you’re early. That’s fine. The core’s shape is what we’re committing to in v0.6.0; the edges are still moving.

The pre-1.0 schema-evolution clause stays in effect: artifact frontmatter fields can still change between minor releases. The CHANGELOG calls every break out explicitly.

How to try it

go install github.com/htekdev/ai-harness/cmd/harness@latest
harness init my-agent
cd my-agent
harness validate -v
harness serve --source stdin

Then drop a Shape A bundle into .harness/plugins/:

---
artifact: harness_artifact/v1alpha1
kind: plugin
name: my-first-bundle
priority: 50
---

## Tool: hello
Say hello and exit.

## Hook: tool.post / log-everything
Print every tool call to stderr.

Re-run harness validate -v. The tool/hook count should go up. That’s the loop. That’s the whole product surface.

The bigger arc

v0.4.0 was the first usable harness.
v0.5.0 was the first one with proper claims verification (Ralph loop at the delegation boundary).
v0.6.0 is the first one where the artifact model, the loop, and the docs all line up with the Harness-as-Code thesis.

That’s the milestone worth marking. v0.7 is async, memory persistence, and the compaction engine. After that, v1.0 is a positioning question, not an engineering one.

Where to go next

Repo: github.com/htekdev/ai-harness
Docs: htekdev.github.io/ai-harness
Live bot: @htekdevaiharness on Telegram
Companion piece: What Is Harness as Code?
Category survey: Live comparison of agent harnesses

If you’ve been waiting for “the small one with real governance,” this is it.